Count the number of lines in a file

Everything else that doesn't fall into one of the other PB categories.
Killswitch
Enthusiast
Enthusiast
Posts: 731
Joined: Wed Apr 21, 2004 7:12 pm

Count the number of lines in a file

Post by Killswitch »

I'm looking for a quick way to read the number of lines in a file, without (if possible) using an external PB library/dll - Win API is fine though.

I'd like to do this without having to read the whole file to count the number of lines, like this:

Code: Select all

While Eof(0)=0
Lines+1
ReadString()
Wend
As, for my purposes, it'll mean reading the file twice - which may take a while if I'm processing lengthy files.

Thanks for the help,

Killswitch
~I see one problem with your reasoning: the fact is thats not a chicken~
User avatar
GedB
Addict
Addict
Posts: 1313
Joined: Fri May 16, 2003 3:47 pm
Location: England
Contact:

Post by GedB »

Killswitch,

The problem is that lines are of a variable length, so there is no way of knowing how many there are until after you have read through the file.

If you have a file of 1000 bytes this might have a single line with 1000 characters, 10 lines with a 100 characters or 57 lines of variable length. Until the whole thing is read, there is no way of knowing.

If you do have a function that returns the number of lines, then it will still have to read through the whole file. You may be using less code, but you won't be saving any disk activity.

If you need the number of lines up front, then read the whole thing into memory. Populate an array of strings, then you can easily get the length and access the contents.

If performance is definately an issue, look for a fixed length record format. If you have a 1000 bytes and you know that the the rows are all 125 bytes long then you know that there are 8 lines.

Perhaps you could have an index file that records the position of the start of each line. The index file will have four bytes for each line. You just have to make sure that index and data are kept in sync.
Fred
Administrator
Administrator
Posts: 18351
Joined: Fri May 17, 2002 4:39 pm
Location: France
Contact:

Post by Fred »

The faster way is to use ReadData() to read the whole file and count the number of character 10. The inner loop would look like this (not tested):

Code: Select all

*Buffer.Byte = AllocateMemory(ReadData(Lof()))
While *Buffer\b <> 0
  If *Buffer\b = 10
    NbLines+1
  EndIf
  *Buffer+1
Wend
FreeMemory(*Buffer)
User avatar
GedB
Addict
Addict
Posts: 1313
Joined: Fri May 16, 2003 3:47 pm
Location: England
Contact:

Post by GedB »

*Buffer\b
Whats the \b for?

Is Byte declared as a structure?
User avatar
Rings
Moderator
Moderator
Posts: 1435
Joined: Sat Apr 26, 2003 1:11 am

Post by Rings »

The PBOSL_FastFileText-Library is exact what you need.
It comes with complete sources, so you can extract yours :)
SPAMINATOR NR.1
fweil
Enthusiast
Enthusiast
Posts: 725
Joined: Thu Apr 22, 2004 5:56 pm
Location: France
Contact:

Post by fweil »

@GedB,

Look in the Structure Viewer of PureBasic editor, for the Byte structure.

Byte contains a single byte named b.

Accordingly when defining *Buffer.Byte you can address the current byte, as a byte value by using *Buffer\b. This is fast and clean.

BTW, Fred's proposal is better coded like this :

Code: Select all

  FileName.s = ***
  If ReadFile(0, FileName)
      *Buffer.Byte = AllocateMemory(Lof())
      ReadData(*Buffer, Lof())
      While *Buffer\b <> 0
        If *Buffer\b = #LF
          NbLines+1
        EndIf
        *Buffer+1
      Wend
      FreeMemory(*Buffer)
  EndIf
  Debug NbLines (or any code you want)
End
To not have to open the file twice, after getting the lines count, you can parse the buffer again directly in memory or restart reading the file using FileSeek() at the position you want (ie if you have positions of lines stored in a list ...).

Rgrds
My avatar is a small copy of the 4x1.8m image I created and exposed at 'Le salon international du meuble à Paris' january 2004 in Matt Sindall's 'Shades' designers exhibition. The original laminated print was designed using a 150 dpi printout.
Killswitch
Enthusiast
Enthusiast
Posts: 731
Joined: Wed Apr 21, 2004 7:12 pm

Post by Killswitch »

I'm just reading a text file, so Freds method should be ok. Reading the file into an array of strings is out of the question - I want to know how many lines there are so I can define an array that will have the correct number of spaces (correct term?) to read that file into. The array is structured, and because of these structures (theres types defined by other structures in there) it can eat a lot of RAM, so by limiting the array to the bare minimum RAM usesage is decreased.
~I see one problem with your reasoning: the fact is thats not a chicken~
User avatar
GedB
Addict
Addict
Posts: 1313
Joined: Fri May 16, 2003 3:47 pm
Location: England
Contact:

Post by GedB »

How about this.

It builds on Freds code.

It loads the file once, as before.

Now it builds an index to the buffer as it counts the line feeds.

It then uses this index to populate the array.

You avoid having to read the file twice, but the only extra ram used is the index buffer.

Code: Select all

#FileName = "Whatever.txt"

IndexBufferSize = 8
IndexBufferOffset = 4
*IndexBuffer = AllocateMemory(IndexBufferSize)
*IndexBuffer

FileName.s = #FileName
If ReadFile(0, FileName)
  *Buffer.BYTE = AllocateMemory(Lof())
  
  PokeL(*IndexBuffer, *Buffer)
  
  ReadData(*Buffer, Lof())
  While *Buffer\b <> 0
    ;Replace CRs and LFs with 0 so they now make the end
    ;of a string.
    If *Buffer\b = #LF
      *Buffer\b = 0
      NbLines+1
      
      PokeL(*IndexBuffer + IndexBufferOffset, *Buffer + 1)
      IndexBufferOffset + 4
      
      If IndexBufferOffset > IndexBufferSize - 4
        IndexBufferSize * 2
        *NewIndexBuffer = ReAllocateMemory(*IndexBuffer, IndexBufferSize)
        If *NewIndexBuffer = 0
          MessageRequester("Fatal Error", "Cannot allocate memory")
          End
        Else
          *IndexBuffer = *NewIndexBuffer
        EndIf
        
      EndIf
      
    ElseIf *Buffer\b = #CR
      *Buffer\b = 0
    EndIf
    *Buffer+1
  Wend
  
EndIf

Dim Lines.s(NbLines)

For i = 1 To NbLines
  Lines(i) = PeekS(PeekL(*IndexBuffer + ((i-1) * 4)))
Next i

FreeMemory(*Buffer)
FreeMemory(*IndexBuffer)

For i = 1 To NbLines
  Debug Lines(i)
Next i
User avatar
Rescator
Addict
Addict
Posts: 1769
Joined: Sat Feb 19, 2005 5:05 pm
Location: Norway

Post by Rescator »

I prefer to read it straight into a list in one go,
thats what lists are for (among other things) :P
PB
PureBasic Expert
PureBasic Expert
Posts: 7581
Joined: Fri Apr 25, 2003 5:24 pm

Post by PB »

Here's my contribution, based on Fred's idea:

Code: Select all

Procedure CountLinesInTextFile(file$)
  If ReadFile(0,file$)
    l=Lof() : m=AllocateMemory(l)
    If m : ReadData(m,l) : m$=PeekS(m) : FreeMemory(m) : EndIf
    CloseFile(0)
  EndIf
  ProcedureReturn CountString(m$,Chr(10))
EndProcedure

Debug CountLinesInTextFile("c:\test.txt")
I compile using 5.31 (x86) on Win 7 Ultimate (64-bit).
"PureBasic won't be object oriented, period" - Fred.
Dare2
Moderator
Moderator
Posts: 3321
Joined: Sat Dec 27, 2003 3:55 am
Location: Great Southern Land

Post by Dare2 »

Idea - stream it into (hidden) editor gadget and ask for the line count from there? (Needs winapi calls). Pretty swift.
@}--`--,-- A rose by any other name ..
User avatar
Rescator
Addict
Addict
Posts: 1769
Joined: Sat Feb 19, 2005 5:05 pm
Location: Norway

Post by Rescator »

The following was inspired by this thread and Horst Schaeffer's filebuffer code.
I created a full mini library!

It's not really that optimized, but should be platform independent.

It lets you read an entire text file into memory,
counts the lines, creates it's own static array, keeps track of the max string length.

And let you retrive the strings easily.
or optionaly just get the length of a string,
or even a direct pointer to a string (need to use PeekS() in that case)

It is able to open a file on it's own,
read from a allready open file (make sure to FileSeek pointer is at proper place first),
or open a file but reuse a existing fileID.

I haven't checked how Rings did his routines, most likely they are better optimized.
But anyway, better to have too many examples than to few right?
Have fun peeps!

Warning! Make sure you use MaxLengthTextMem() to check max string length, because it is possible to return strings as large as the system memory allows.

Also be aware that there is no limit to the number of lines, or size of a file.
Only limitations should be available memory and PureBasics internal size restrictions on files and memory and values.

Also note that you can have multiple files loaded at the same time.

Note! No string editing or replacement is possible, except "In Place" editing or replacement. This is the reason that PointerTextMem() and LengthTextMem() exist, to allow advanced manipulation of the loaded text.

EDIT: Added automatic 0 terminator to the strings (this means the EOF is changed),
this should make PointerTextMem() more interesting to use with libraries and functions and api and GUI calls that expect strings to be 0 terminated.
Also fixed a bug with PointerTextMem().

LoadText.pbi

Code: Select all

Procedure.l LoadTextMem(fileID,file.s) ; Returns 0 if not enough memory or open failed!
 Protected fileID,fhdl,*buffer,Size,readsize,*ptr.BYTE,*ptrend,Lines,c,n,*strptr,Length,maxlength
 If file
  fhdl=ReadFile(fileID,file)
  If fhdl=false : ProcedureReturn 0 : EndIf
 Else
  If IsFile(fhdl)=false : ProcedureReturn 0 : EndIf
  UseFile(fhdl)
  fhdl=fileID
 EndIf
 If fhdl
  Size=Lof()
  If Size>0
   *buffer=AllocateMemory(Size+16)
   If *buffer
    readsize=ReadData(*buffer+16,Size) ;header is 16 bytes
    PokeL(*buffer,readsize) ; Text file size
    
    *ptr=*buffer+16
    *ptrend=*ptr+readsize
    If *ptr<*ptrend
     Lines=0
     While *ptr<*ptrend
      While *ptr<*ptrend
       c=*ptr\b : *ptr+1
       If c=13 : Break : EndIf
       If c=10 : Break : EndIf
      Wend
      If *ptr<*ptrend
       n=*ptr\b
       If n+c=23 : *ptr+1 : EndIf
       Lines+1
      EndIf
     Wend
     PokeL(*buffer+4,Lines) ; Number of lines
    EndIf
    
    *index=AllocateMemory(Lines*8)
    If *index
     PokeL(*buffer+8,*index) ; Index pointer
     *ptr=*buffer+16
     *ptrend=*ptr+readsize
     If *ptr<*ptrend
      Lines=0
      While *ptr<*ptrend
       Length=0
       *strptr=*ptr
       While *ptr<*ptrend
        c=*ptr\b : *ptr+1
        If c=13 : Break : EndIf
        If c=10 : Break : EndIf
        Length+1
       Wend
       If *ptr<*ptrend
        n=*ptr\b
        If n+c=23 : *ptr+1 : EndIf
        PokeL(*index+(Lines*8),*strptr)
        PokeL((*index+4)+(Lines*8),Length)
        PokeB((*strptr+Length)+1,0) ; places a 0 terminator where the EOF is.
        Lines+1
        If maxlength<Length : maxlength=Length : EndIf
       EndIf
      Wend
     EndIf
     PokeL(*buffer+12,maxlength) ; Maximum string length
    Else
     FreeMemory(*buffer)
     ProcedureReturn 0
    EndIf
    
    ProcedureReturn *buffer
   EndIf
  EndIf
  If file Or fileID=#PB_Any : CloseFile(fhdl) : EndIf
 EndIf
 ProcedureReturn 0
EndProcedure

Procedure FreeTextMem(*buffer) ; Free text buffer and index memory
 Protected *index
 If *buffer
  *index=PeekL(*buffer+8)
  If *index
   FreeMemory(*index)
  EndIf
  FreeMemory(*buffer)
 EndIf
EndProcedure 

Procedure.l CountTextMem(*buffer) ; Return number of lines
 Protected *index
 If *buffer
  ProcedureReturn PeekL(*buffer+4)
 EndIf
 ProcedureReturn 0
EndProcedure 

Procedure.l MaxLengthTextMem(*buffer) ; Return length of the longest string
 Protected *index
 If *buffer
  ProcedureReturn PeekL(*buffer+12)
 EndIf
 ProcedureReturn 0
EndProcedure 

Procedure.s ReadTextMem(*buffer,line) ; Return specified line as string, "" if error or empty, range 0 to CountTextMem(*buffer)-1
 Protected *index,Length,*strptr
 If *buffer
  *index=PeekL(*buffer+8)
  *strptr=*index+(line*8)
  Length=PeekL(*strptr+4)
  If Length>0
   ProcedureReturn PeekS(PeekL(*strptr),Length)
  EndIf
 EndIf
 ProcedureReturn ""
EndProcedure 

Procedure.l PointerTextMem(*buffer,line) ; Return pointer to string, 0 if error, range 0 to CountTextMem(*buffer)-1
 Protected *index,Length,*strptr
 If *buffer
  *index=PeekL(*buffer+8)
  ProcedureReturn PeekL(*index+(line*8))
 EndIf
 ProcedureReturn 0
EndProcedure 

Procedure.l LengthTextMem(*buffer,line) ; Return length of string, 0 if error or empty, range 0 to CountTextMem(*buffer)-1
 Protected *index
 If *buffer
  *index=PeekL(*buffer+8)
  ProcedureReturn PeekL((*index+(line*8))+4)
 EndIf
 ProcedureReturn 0
EndProcedure 
LoadTextTest.pb

Code: Select all

XIncludeFile "LoadText.pbi"
#TextFile=0

TextFile$=OpenFileRequester("Please choose file to load","F:\","Text (*.txt)|*.txt",0)
If TextFile$
 *textptr=LoadTextMem(#TextFile,TextFile$)
 linestotal=CountTextMem(*textptr)
 Debug "Number of lines: "+Str(linestotal)
 Debug "Longest line: "+Str(MaxLengthTextMem(*textptr))
 line=0
 While line<linestotal
  Debug ReadTextMem(*textptr,line)
  line+1
 Wend
 FreeTextMem(*textptr)
EndIf

User avatar
Psychophanta
Always Here
Always Here
Posts: 5153
Joined: Wed Jun 11, 2003 9:33 pm
Location: Anare
Contact:

Post by Psychophanta »

My contribution:

Code: Select all

Procedure.l TextCountChars(*Text.b,*CharacterToFind.s,TextLenght.l)
  !mov edi,dword[esp]    ;pointer to the first character in string (first function parameter)
  !;cld              ;clear DF (Direction Flag). (normally not necessary; cleared by default)
  !xor ebx,ebx  ;init counter to NULL
  !mov ecx,dword[esp+8]    ;lets set # characters
  !inc ecx
  !jecxz near CountCharsgo    ;if 0, then exit returning 0
  !mov edi,dword[esp]     ;point again to the first character in string (first function parameter)
  !mov eax,dword[esp+4]
  !mov al,byte[eax]    ;al=character to find
  !@@:REPNZ scasb   ;repeat comparing AL CPU register content with byte[edi]
  !jecxz near CountCharsgo     ;until ecx value is reached
  !inc ebx       ;or a match is found
  !jmp near @r       ;continue comparing next character
  !CountCharsgo:MOV eax,ebx   ;output the matches counter
  ProcedureReturn
EndProcedure
Usage:
Result=OpenFile(0,"whatever")
*Buffer.b=AllocateMemory(Lof()):ReadData(*Buffer,Lof())
NumberOfLines.l=TextCountChars(*Buffer,Chr(10),Lof())
Debug NumberOfLines.l
FreeMemory(*Buffer)
Just for speed. Max text lenght 2^32-1 characters :)
http://www.zeitgeistmovie.com

while (world==business) world+=mafia;
User avatar
Rings
Moderator
Moderator
Posts: 1435
Joined: Sat Apr 26, 2003 1:11 am

Post by Rings »

Rescator wrote: I haven't checked how Rings did his routines, most likely they are better optimized.
But anyway, better to have too many examples than to few right?
oh yes, the PBOSL contains fully sourcecode to this. just a look ahead ;)
and yes, the wheel is a fine toy to invent.
SPAMINATOR NR.1
Dare2
Moderator
Moderator
Posts: 3321
Joined: Sat Dec 27, 2003 3:55 am
Location: Great Southern Land

Post by Dare2 »

Rings wrote:oh yes, the PBOSL contains fully sourcecode to this. just a look ahead ;)
and yes, the wheel is a fine toy to invent.
Also to modify.

That is how we got gear wheels, steering wheels and hula hoops :D
@}--`--,-- A rose by any other name ..
Post Reply