Page 1 of 2

Count the number of lines in a file

Posted: Wed Jul 13, 2005 1:29 pm
by Killswitch
I'm looking for a quick way to read the number of lines in a file, without (if possible) using an external PB library/dll - Win API is fine though.

I'd like to do this without having to read the whole file to count the number of lines, like this:

Code: Select all

While Eof(0)=0
Lines+1
ReadString()
Wend
As, for my purposes, it'll mean reading the file twice - which may take a while if I'm processing lengthy files.

Thanks for the help,

Killswitch

Posted: Wed Jul 13, 2005 2:19 pm
by GedB
Killswitch,

The problem is that lines are of a variable length, so there is no way of knowing how many there are until after you have read through the file.

If you have a file of 1000 bytes this might have a single line with 1000 characters, 10 lines with a 100 characters or 57 lines of variable length. Until the whole thing is read, there is no way of knowing.

If you do have a function that returns the number of lines, then it will still have to read through the whole file. You may be using less code, but you won't be saving any disk activity.

If you need the number of lines up front, then read the whole thing into memory. Populate an array of strings, then you can easily get the length and access the contents.

If performance is definately an issue, look for a fixed length record format. If you have a 1000 bytes and you know that the the rows are all 125 bytes long then you know that there are 8 lines.

Perhaps you could have an index file that records the position of the start of each line. The index file will have four bytes for each line. You just have to make sure that index and data are kept in sync.

Posted: Wed Jul 13, 2005 3:24 pm
by Fred
The faster way is to use ReadData() to read the whole file and count the number of character 10. The inner loop would look like this (not tested):

Code: Select all

*Buffer.Byte = AllocateMemory(ReadData(Lof()))
While *Buffer\b <> 0
  If *Buffer\b = 10
    NbLines+1
  EndIf
  *Buffer+1
Wend
FreeMemory(*Buffer)

Posted: Wed Jul 13, 2005 3:31 pm
by GedB
*Buffer\b
Whats the \b for?

Is Byte declared as a structure?

Posted: Wed Jul 13, 2005 3:35 pm
by Rings
The PBOSL_FastFileText-Library is exact what you need.
It comes with complete sources, so you can extract yours :)

Posted: Wed Jul 13, 2005 4:24 pm
by fweil
@GedB,

Look in the Structure Viewer of PureBasic editor, for the Byte structure.

Byte contains a single byte named b.

Accordingly when defining *Buffer.Byte you can address the current byte, as a byte value by using *Buffer\b. This is fast and clean.

BTW, Fred's proposal is better coded like this :

Code: Select all

  FileName.s = ***
  If ReadFile(0, FileName)
      *Buffer.Byte = AllocateMemory(Lof())
      ReadData(*Buffer, Lof())
      While *Buffer\b <> 0
        If *Buffer\b = #LF
          NbLines+1
        EndIf
        *Buffer+1
      Wend
      FreeMemory(*Buffer)
  EndIf
  Debug NbLines (or any code you want)
End
To not have to open the file twice, after getting the lines count, you can parse the buffer again directly in memory or restart reading the file using FileSeek() at the position you want (ie if you have positions of lines stored in a list ...).

Rgrds

Posted: Wed Jul 13, 2005 4:35 pm
by Killswitch
I'm just reading a text file, so Freds method should be ok. Reading the file into an array of strings is out of the question - I want to know how many lines there are so I can define an array that will have the correct number of spaces (correct term?) to read that file into. The array is structured, and because of these structures (theres types defined by other structures in there) it can eat a lot of RAM, so by limiting the array to the bare minimum RAM usesage is decreased.

Posted: Wed Jul 13, 2005 6:35 pm
by GedB
How about this.

It builds on Freds code.

It loads the file once, as before.

Now it builds an index to the buffer as it counts the line feeds.

It then uses this index to populate the array.

You avoid having to read the file twice, but the only extra ram used is the index buffer.

Code: Select all

#FileName = "Whatever.txt"

IndexBufferSize = 8
IndexBufferOffset = 4
*IndexBuffer = AllocateMemory(IndexBufferSize)
*IndexBuffer

FileName.s = #FileName
If ReadFile(0, FileName)
  *Buffer.BYTE = AllocateMemory(Lof())
  
  PokeL(*IndexBuffer, *Buffer)
  
  ReadData(*Buffer, Lof())
  While *Buffer\b <> 0
    ;Replace CRs and LFs with 0 so they now make the end
    ;of a string.
    If *Buffer\b = #LF
      *Buffer\b = 0
      NbLines+1
      
      PokeL(*IndexBuffer + IndexBufferOffset, *Buffer + 1)
      IndexBufferOffset + 4
      
      If IndexBufferOffset > IndexBufferSize - 4
        IndexBufferSize * 2
        *NewIndexBuffer = ReAllocateMemory(*IndexBuffer, IndexBufferSize)
        If *NewIndexBuffer = 0
          MessageRequester("Fatal Error", "Cannot allocate memory")
          End
        Else
          *IndexBuffer = *NewIndexBuffer
        EndIf
        
      EndIf
      
    ElseIf *Buffer\b = #CR
      *Buffer\b = 0
    EndIf
    *Buffer+1
  Wend
  
EndIf

Dim Lines.s(NbLines)

For i = 1 To NbLines
  Lines(i) = PeekS(PeekL(*IndexBuffer + ((i-1) * 4)))
Next i

FreeMemory(*Buffer)
FreeMemory(*IndexBuffer)

For i = 1 To NbLines
  Debug Lines(i)
Next i

Posted: Thu Jul 14, 2005 2:10 am
by Rescator
I prefer to read it straight into a list in one go,
thats what lists are for (among other things) :P

Posted: Sat Jul 16, 2005 8:31 am
by PB
Here's my contribution, based on Fred's idea:

Code: Select all

Procedure CountLinesInTextFile(file$)
  If ReadFile(0,file$)
    l=Lof() : m=AllocateMemory(l)
    If m : ReadData(m,l) : m$=PeekS(m) : FreeMemory(m) : EndIf
    CloseFile(0)
  EndIf
  ProcedureReturn CountString(m$,Chr(10))
EndProcedure

Debug CountLinesInTextFile("c:\test.txt")

Posted: Sat Jul 16, 2005 8:35 am
by Dare2
Idea - stream it into (hidden) editor gadget and ask for the line count from there? (Needs winapi calls). Pretty swift.

Posted: Sat Jul 16, 2005 9:30 pm
by Rescator
The following was inspired by this thread and Horst Schaeffer's filebuffer code.
I created a full mini library!

It's not really that optimized, but should be platform independent.

It lets you read an entire text file into memory,
counts the lines, creates it's own static array, keeps track of the max string length.

And let you retrive the strings easily.
or optionaly just get the length of a string,
or even a direct pointer to a string (need to use PeekS() in that case)

It is able to open a file on it's own,
read from a allready open file (make sure to FileSeek pointer is at proper place first),
or open a file but reuse a existing fileID.

I haven't checked how Rings did his routines, most likely they are better optimized.
But anyway, better to have too many examples than to few right?
Have fun peeps!

Warning! Make sure you use MaxLengthTextMem() to check max string length, because it is possible to return strings as large as the system memory allows.

Also be aware that there is no limit to the number of lines, or size of a file.
Only limitations should be available memory and PureBasics internal size restrictions on files and memory and values.

Also note that you can have multiple files loaded at the same time.

Note! No string editing or replacement is possible, except "In Place" editing or replacement. This is the reason that PointerTextMem() and LengthTextMem() exist, to allow advanced manipulation of the loaded text.

EDIT: Added automatic 0 terminator to the strings (this means the EOF is changed),
this should make PointerTextMem() more interesting to use with libraries and functions and api and GUI calls that expect strings to be 0 terminated.
Also fixed a bug with PointerTextMem().

LoadText.pbi

Code: Select all

Procedure.l LoadTextMem(fileID,file.s) ; Returns 0 if not enough memory or open failed!
 Protected fileID,fhdl,*buffer,Size,readsize,*ptr.BYTE,*ptrend,Lines,c,n,*strptr,Length,maxlength
 If file
  fhdl=ReadFile(fileID,file)
  If fhdl=false : ProcedureReturn 0 : EndIf
 Else
  If IsFile(fhdl)=false : ProcedureReturn 0 : EndIf
  UseFile(fhdl)
  fhdl=fileID
 EndIf
 If fhdl
  Size=Lof()
  If Size>0
   *buffer=AllocateMemory(Size+16)
   If *buffer
    readsize=ReadData(*buffer+16,Size) ;header is 16 bytes
    PokeL(*buffer,readsize) ; Text file size
    
    *ptr=*buffer+16
    *ptrend=*ptr+readsize
    If *ptr<*ptrend
     Lines=0
     While *ptr<*ptrend
      While *ptr<*ptrend
       c=*ptr\b : *ptr+1
       If c=13 : Break : EndIf
       If c=10 : Break : EndIf
      Wend
      If *ptr<*ptrend
       n=*ptr\b
       If n+c=23 : *ptr+1 : EndIf
       Lines+1
      EndIf
     Wend
     PokeL(*buffer+4,Lines) ; Number of lines
    EndIf
    
    *index=AllocateMemory(Lines*8)
    If *index
     PokeL(*buffer+8,*index) ; Index pointer
     *ptr=*buffer+16
     *ptrend=*ptr+readsize
     If *ptr<*ptrend
      Lines=0
      While *ptr<*ptrend
       Length=0
       *strptr=*ptr
       While *ptr<*ptrend
        c=*ptr\b : *ptr+1
        If c=13 : Break : EndIf
        If c=10 : Break : EndIf
        Length+1
       Wend
       If *ptr<*ptrend
        n=*ptr\b
        If n+c=23 : *ptr+1 : EndIf
        PokeL(*index+(Lines*8),*strptr)
        PokeL((*index+4)+(Lines*8),Length)
        PokeB((*strptr+Length)+1,0) ; places a 0 terminator where the EOF is.
        Lines+1
        If maxlength<Length : maxlength=Length : EndIf
       EndIf
      Wend
     EndIf
     PokeL(*buffer+12,maxlength) ; Maximum string length
    Else
     FreeMemory(*buffer)
     ProcedureReturn 0
    EndIf
    
    ProcedureReturn *buffer
   EndIf
  EndIf
  If file Or fileID=#PB_Any : CloseFile(fhdl) : EndIf
 EndIf
 ProcedureReturn 0
EndProcedure

Procedure FreeTextMem(*buffer) ; Free text buffer and index memory
 Protected *index
 If *buffer
  *index=PeekL(*buffer+8)
  If *index
   FreeMemory(*index)
  EndIf
  FreeMemory(*buffer)
 EndIf
EndProcedure 

Procedure.l CountTextMem(*buffer) ; Return number of lines
 Protected *index
 If *buffer
  ProcedureReturn PeekL(*buffer+4)
 EndIf
 ProcedureReturn 0
EndProcedure 

Procedure.l MaxLengthTextMem(*buffer) ; Return length of the longest string
 Protected *index
 If *buffer
  ProcedureReturn PeekL(*buffer+12)
 EndIf
 ProcedureReturn 0
EndProcedure 

Procedure.s ReadTextMem(*buffer,line) ; Return specified line as string, "" if error or empty, range 0 to CountTextMem(*buffer)-1
 Protected *index,Length,*strptr
 If *buffer
  *index=PeekL(*buffer+8)
  *strptr=*index+(line*8)
  Length=PeekL(*strptr+4)
  If Length>0
   ProcedureReturn PeekS(PeekL(*strptr),Length)
  EndIf
 EndIf
 ProcedureReturn ""
EndProcedure 

Procedure.l PointerTextMem(*buffer,line) ; Return pointer to string, 0 if error, range 0 to CountTextMem(*buffer)-1
 Protected *index,Length,*strptr
 If *buffer
  *index=PeekL(*buffer+8)
  ProcedureReturn PeekL(*index+(line*8))
 EndIf
 ProcedureReturn 0
EndProcedure 

Procedure.l LengthTextMem(*buffer,line) ; Return length of string, 0 if error or empty, range 0 to CountTextMem(*buffer)-1
 Protected *index
 If *buffer
  *index=PeekL(*buffer+8)
  ProcedureReturn PeekL((*index+(line*8))+4)
 EndIf
 ProcedureReturn 0
EndProcedure 
LoadTextTest.pb

Code: Select all

XIncludeFile "LoadText.pbi"
#TextFile=0

TextFile$=OpenFileRequester("Please choose file to load","F:\","Text (*.txt)|*.txt",0)
If TextFile$
 *textptr=LoadTextMem(#TextFile,TextFile$)
 linestotal=CountTextMem(*textptr)
 Debug "Number of lines: "+Str(linestotal)
 Debug "Longest line: "+Str(MaxLengthTextMem(*textptr))
 line=0
 While line<linestotal
  Debug ReadTextMem(*textptr,line)
  line+1
 Wend
 FreeTextMem(*textptr)
EndIf


Posted: Sat Jul 16, 2005 11:14 pm
by Psychophanta
My contribution:

Code: Select all

Procedure.l TextCountChars(*Text.b,*CharacterToFind.s,TextLenght.l)
  !mov edi,dword[esp]    ;pointer to the first character in string (first function parameter)
  !;cld              ;clear DF (Direction Flag). (normally not necessary; cleared by default)
  !xor ebx,ebx  ;init counter to NULL
  !mov ecx,dword[esp+8]    ;lets set # characters
  !inc ecx
  !jecxz near CountCharsgo    ;if 0, then exit returning 0
  !mov edi,dword[esp]     ;point again to the first character in string (first function parameter)
  !mov eax,dword[esp+4]
  !mov al,byte[eax]    ;al=character to find
  !@@:REPNZ scasb   ;repeat comparing AL CPU register content with byte[edi]
  !jecxz near CountCharsgo     ;until ecx value is reached
  !inc ebx       ;or a match is found
  !jmp near @r       ;continue comparing next character
  !CountCharsgo:MOV eax,ebx   ;output the matches counter
  ProcedureReturn
EndProcedure
Usage:
Result=OpenFile(0,"whatever")
*Buffer.b=AllocateMemory(Lof()):ReadData(*Buffer,Lof())
NumberOfLines.l=TextCountChars(*Buffer,Chr(10),Lof())
Debug NumberOfLines.l
FreeMemory(*Buffer)
Just for speed. Max text lenght 2^32-1 characters :)

Posted: Sun Jul 17, 2005 8:07 am
by Rings
Rescator wrote: I haven't checked how Rings did his routines, most likely they are better optimized.
But anyway, better to have too many examples than to few right?
oh yes, the PBOSL contains fully sourcecode to this. just a look ahead ;)
and yes, the wheel is a fine toy to invent.

Posted: Sun Jul 17, 2005 12:48 pm
by Dare2
Rings wrote:oh yes, the PBOSL contains fully sourcecode to this. just a look ahead ;)
and yes, the wheel is a fine toy to invent.
Also to modify.

That is how we got gear wheels, steering wheels and hula hoops :D