Page 1 of 2

NOL

Posted: Thu Jan 20, 2011 3:54 am
by Seymour Clufley
If you need the number of lines in a file, the only native way in PB (that I know of) is to go through it with ReadString until you reach the EOF, counting up one each time.

Code: Select all

Procedure.f NOL(fn.s)
	f = ReadFile(#PB_Any,fn)
	If f
		While Not Eof(f)
			ReadString(f)
			n+1
		Wend
		CloseFile(f)
		ProcedureReturn n
	EndIf
	ProcedureReturn 0
EndProcedure
number_of_lines = NOL(filename$,10)
But if you know that every line in the file will be of a certain length, you can use this code which is MUCH faster:

Code: Select all

Procedure.i FixedLengthNOL(fn.s,linelength)
	fs = FileSize(fn)
	If fs>0
		ProcedureReturn Round(fs/(linelength+2),#PB_Round_Up)
	EndIf
	ProcedureReturn 0
EndProcedure
number_of_lines = FixedLengthNOL(filename$,10)
I've tried it on many different files and it's counted correctly every time. I know that code like this may be obvious to many people - I'm hoping there are other PB users out there as gormless as me who will benefit from this post!

This macro version is even faster:

Code: Select all

Macro FixedLengthNOL(fn,linelength)
	Round(FileSize(fn)/(linelength+2),#PB_Round_Up)
EndMacro
number_of_lines = FixedLengthNOL(filename$,10)
... but it will cause a crash if the FileSize is zero (division by zero forbidden), so you really have to use the procedure version.

You could use this next version, but it's not so easy to integrate into your code:

Code: Select all

Macro FixedLengthNOL(fn,linelength,var)
	var = 0
	fs = FileSize(fn)
	If fs>0
		var = Round(fs/(linelength+2),#PB_Round_Up)
	EndIf
EndMacro
FixedLengthNOL(filename$,10,number_of_lines)

As for getting the line count when you DON'T know how long each line will be... I've written this but it's slower than using While/ReadString/Wend:

Code: Select all

Procedure.i NOL(fn.s)
	
	f = ReadFile(#PB_Any,fn)
	If f
		loaf = Lof(f)
		If loaf<1
			CloseFile(f)
			ProcedureReturn 0
		EndIf
		
		*mem = AllocateMemory(loaf)
		ReadData(f,*mem,loaf)
		CloseFile(f)
		For a = 1 To loaf
			this.i = PeekA(*mem+a-1)
			If this = 13
				lines+1
			EndIf
		Next a
	EndIf
	ProcedureReturn lines
	
EndProcedure
On my machine, running that 1000 times takes 703ms, whereas the ReadString method only takes 141ms. Are there any faster ways?

Again, I know the FixedLengthNOL procedure may be obvious, but no doubt somebody, somewhere, will use it. :)

Re: NOL

Posted: Thu Jan 20, 2011 5:24 am
by IdeasVacuum
I think a lot depends on the file size and line length anyway, but we can't compare speeds unless we all test the same files.

I thought a variation of this would be OK:

Code: Select all

#File = 1

Procedure CountLines(sFileName.s)
;--------------------------------
iFileLength.i = FileSize(sFileName)

*MemBuff = AllocateMemory(iFileLength)

If OpenFile(#File,sFileName)

      ReadData(#File, *MemBuff, iFileLength)

      sWholeFile.s = PeekS(*MemBuff, iFileLength, #PB_Ascii)

            iPosn.i = 1
             iCnt.i = 0

      While iPosn > 0

            iPosn = FindString(sWholeFile, #CRLF$, iPosn)
            If iPosn > 0 : iPosn + 1 : iCnt + 1 : EndIf
      Wend

      Debug iCnt

EndIf

CloseFile(#File)

EndProcedure

CountLines("C:\MyFile.txt")

End

Re: NOL

Posted: Thu Jan 20, 2011 5:52 am
by JackWebb
For a file with records of unknown length (sequential access file) I don't think there is any easier or faster way (well maybe a little faster) than plain old ReadString in a loop, as you pointed out.

Code: Select all

OpenFile(#TestData, "Test.txt")

Repeat
  Junk$ = ReadString(#TestData, #PB_Ascii)
  TotalRecords + 1
Until Eof(#TestData)
CloseFile(#TestData)

Debug TotalRecords
For fixed length records (random access file) this is how I like to do it.

Code: Select all

EnableExplicit

Declare GetRecord(FileName$, RecNum)
Declare PutRecord(FileName$, RecNum)
Declare GetLastRecord(FileName$)

Define i.i, FileName$ = "c:\UserData.txt"

Structure UserType
  Name.s{30}
  RecNum.i
EndStructure : Global UserData.UserType

CreateFile(0, FileName$)
CloseFile(0)

UserData\Name = "Jack Webb"
For i = 1 To 100
  UserData\RecNum = i
  PutRecord(FileName$, i)
Next

GetRecord(FileName$, 55) ; get record 55

Debug UserData\Name
Debug "Record Number = " + Str(UserData\RecNum)
Debug "Total Records = " + Str(GetLastRecord(FileName$))

Procedure GetRecord(FileName$, RecNum)
  Define FreeFile

  FreeFile = OpenFile(#PB_Any, FileName$)
  If FreeFile
    FileSeek(FreeFile, (RecNum - 1) * SizeOf(UserData))
    ReadData(FreeFile, @UserData, SizeOf(UserData)) 
    CloseFile(FreeFile) 
  EndIf
EndProcedure

Procedure PutRecord(FileName$, RecNum)
  Define FreeFile

  FreeFile = OpenFile(#PB_Any, FileName$)
  If FreeFile
    FileSeek(FreeFile, (RecNum - 1) * SizeOf(UserData))
    WriteData(FreeFile, @UserData, SizeOf(UserData)) 
    CloseFile(FreeFile) 
  EndIf
EndProcedure

Procedure GetLastRecord(FileName$)
  Define FreeFile
  Define LastRec

  FreeFile = OpenFile(#PB_Any, FileName$)
  If FreeFile
    LastRec = Lof(FreeFile) / SizeOf(UserData)
    CloseFile(FreeFile)
  EndIf

  ProcedureReturn LastRec
EndProcedure

Re: NOL

Posted: Thu Jan 20, 2011 8:17 am
by ts-soft
@IdeasVacuum

Why not using CountString()?
I think it's faster as FindString in a loop.

Greetings - Thomas

Re: NOL

Posted: Thu Jan 20, 2011 10:36 am
by infratec
Hi,

my version:

Code: Select all

Procedure CountLinesBernd(sFileName.s)
  
  FileLength.q = FileSize(sFileName)
  *MemBuff = AllocateMemory(FileLength)
  If *MemBuff
    File = OpenFile(#PB_Any, sFileName)
    If File
      ReadData(File, *MemBuff, FileLength)
      iCnt.i = 0
      For i = 0 To FileLength
        If PeekA(*MemBuff + i) = $0D
          iCnt + 1
        EndIf
      Next i
      CloseFile(File)
    EndIf
    FreeMemory(*MemBuff)
  EndIf
  
  ProcedureReturn iCnt
EndProcedure

Starttime = ElapsedMilliseconds()
Test = CountLinesBernd("C:\rhide.txt")
Endtime = ElapsedMilliseconds()
MessageRequester("Time", "Lines: " + Str(Test) + " in " + Str(Endtime - Starttime))
Bernd

Re: NOL

Posted: Thu Jan 20, 2011 10:46 am
by ts-soft
@infratec

Never OpenFile if not required!
FileSize is not required, if you read/open file:

Code: Select all

#File = 0
Procedure CountLinesBKK(sFileName.s)
 
;   FileLength.q = FileSize(sFileName)
;   *MemBuff = AllocateMemory(FileLength)
;   If *MemBuff
  If ReadFile(#File,sFileName)
    FileLength.q = Lof(#File)
    *MemBuff = AllocateMemory(FileLength)
    If *MemBuff
      ReadData(#File, *MemBuff, FileLength)
      iCnt.i = 0
      For i = 0 To FileLength
        If PeekA(*MemBuff + i) = $0D
          iCnt + 1
        EndIf
      Next i
      CloseFile(#File)
    EndIf
  FreeMemory(*MemBuff)
  EndIf

  ProcedureReturn iCnt
EndProcedure

Starttime = ElapsedMilliseconds()
Test = CountLinesBKK("C:\rhide.txt")
MessageRequester("Time", "Lines: " + Str(Test) + " in " + Str(ElapsedMilliseconds() - Starttime))
I don't think, you can detect a difference but is faster, but with CountString is it very faster :wink:

Greetings - Thomas

Re: NOL

Posted: Thu Jan 20, 2011 10:57 am
by ts-soft
My Version :wink:

Code: Select all

EnableExplicit

Procedure CountLinesFast(sFileName.s)
  Protected FF, size.q, *mem, result
  FF = ReadFile(#PB_Any, sFileName)
  If FF
    size = Lof(FF)
    *mem = AllocateMemory(size)
    If *mem
      ReadData(FF, *mem, size)
      result = CountString(PeekS(*mem, size), #CR$) + 1
      FreeMemory(*mem)
    EndIf
    CloseFile(FF)
  EndIf
  ProcedureReturn result
EndProcedure

Define Starttime = ElapsedMilliseconds()
Define test = CountLinesFast(#PB_Compiler_Home + "SDK\Readme.txt")
MessageRequester("Time", "Lines: " + Str(Test) + " in " + Str(ElapsedMilliseconds() - Starttime))

Re: NOL

Posted: Thu Jan 20, 2011 11:15 am
by infratec
Hi Thomas,

your version needs double of memory :mrgreen:
and the time is the same.
(my file has 4457 lines and 182422 bytes)

I thought that FileSize() does not open the file.
I thought Filesize looks in the filesystem table.

And it makes no difference in time :D

Here is a modified version which is capable to handle the bigest filesize
(if you have enogh memory)
'For' can not handle quads :!:

Code: Select all

Procedure CountLinesBernd(sFileName.s)
  
  iCnt.i = -1
  FileLength.q = FileSize(sFileName)
  If FileLength
    *MemBuff = AllocateMemory(FileLength)
    If *MemBuff
      File = OpenFile(#PB_Any, sFileName)
      If File
        ReadData(File, *MemBuff, FileLength)
        iCnt = 0
        i.q = 0
        Repeat
          If PeekA(*MemBuff + i) = $0D
            iCnt + 1
          EndIf
          i + 1
        Until i = FileLength
        CloseFile(File)
      EndIf
      FreeMemory(*MemBuff)
    EndIf
  EndIf
  
  ProcedureReturn iCnt
EndProcedure

Starttime = ElapsedMilliseconds()
Test = CountLinesBernd("C:\rhide.txt")
Endtime = ElapsedMilliseconds()
MessageRequester("Time", "Lines: " + Str(Test) + " in " + Str(Endtime - Starttime))
Bernd

Re: NOL

Posted: Thu Jan 20, 2011 11:31 am
by ts-soft
You are right, no difference.

Textfile with 1048726 bytes.
Both version parst in 0 ms :lol:

Re: NOL

Posted: Thu Jan 20, 2011 8:36 pm
by infratec
Hi Thomas,

your version is faster :!: (twice :( )
If you run it in a loop with a count up to 100 or higher you will see the difference.

So I found a trick to combine your speed with my less memory usage:

Code: Select all

Procedure CountLinesFast(sFileName.s)
  Protected FF, size.q, result, String.s
  result = -1
  FF = ReadFile(#PB_Any, sFileName)
  If FF
    size = Lof(FF)
    String = Space(size)
    ReadData(FF, @String, size)
    result = CountString(String, #CR$) + 1
    CloseFile(FF)
  EndIf
  ProcedureReturn result
EndProcedure
So it's the best of both :mrgreen: :mrgreen: :mrgreen:

Bernd

Re: NOL

Posted: Fri Jan 21, 2011 3:41 am
by IdeasVacuum
ts-soft wrote:
Why not using CountString()?
I think it's faster as FindString in a loop.

Greetings - Thomas
..... I didn't know CountString() existed :mrgreen: :D

Re: NOL

Posted: Fri Jan 21, 2011 3:47 am
by IdeasVacuum
....that is very fast Bernd. Tiny issue, it seems to report one too many lines.

Re: NOL

Posted: Fri Jan 21, 2011 8:50 am
by infratec
Hi,

I don't think that it is one to high.
First I made the same mistake:
If you have one line with a CR at the end, you have 2 lines :!:
The first one and now a second empty one.

To check it use, for example pspad (my favourite editor) and open the text file with it.
At the bottom you can see the number of lines.
It is identical to my procedure. :D

Bernd

Re: NOL

Posted: Fri Jan 21, 2011 4:33 pm
by IdeasVacuum
....hmm, you would rarely want to read an empty last line. However, your code is flexible from that standpoint because the '+1' can be omitted.

Re: NOL

Posted: Sat Jan 22, 2011 10:13 am
by infratec
Hi,

if I follow your Idea Vacuum ( :) ) than open notepad, type in IdeasVacuum,
don't press return and save this file.

You have now a text file without CR at the end and you tell us that this file has no line.
(It's vacuum :) )

A line without everything is a new line ( \n in C )

So the + 1 is right.

Best regards

Bernd

P.S.: Don't be angry about the small jokes in the text.
It's part of my humor.
So I say better 'excuse me' in case if you misinterprete the text.