NOL

Share your advanced PureBasic knowledge/code with the community.
Seymour Clufley
Addict
Addict
Posts: 1264
Joined: Wed Feb 28, 2007 9:13 am
Location: London

NOL

Post by Seymour Clufley »

If you need the number of lines in a file, the only native way in PB (that I know of) is to go through it with ReadString until you reach the EOF, counting up one each time.

Code: Select all

Procedure.f NOL(fn.s)
	f = ReadFile(#PB_Any,fn)
	If f
		While Not Eof(f)
			ReadString(f)
			n+1
		Wend
		CloseFile(f)
		ProcedureReturn n
	EndIf
	ProcedureReturn 0
EndProcedure
number_of_lines = NOL(filename$,10)
But if you know that every line in the file will be of a certain length, you can use this code which is MUCH faster:

Code: Select all

Procedure.i FixedLengthNOL(fn.s,linelength)
	fs = FileSize(fn)
	If fs>0
		ProcedureReturn Round(fs/(linelength+2),#PB_Round_Up)
	EndIf
	ProcedureReturn 0
EndProcedure
number_of_lines = FixedLengthNOL(filename$,10)
I've tried it on many different files and it's counted correctly every time. I know that code like this may be obvious to many people - I'm hoping there are other PB users out there as gormless as me who will benefit from this post!

This macro version is even faster:

Code: Select all

Macro FixedLengthNOL(fn,linelength)
	Round(FileSize(fn)/(linelength+2),#PB_Round_Up)
EndMacro
number_of_lines = FixedLengthNOL(filename$,10)
... but it will cause a crash if the FileSize is zero (division by zero forbidden), so you really have to use the procedure version.

You could use this next version, but it's not so easy to integrate into your code:

Code: Select all

Macro FixedLengthNOL(fn,linelength,var)
	var = 0
	fs = FileSize(fn)
	If fs>0
		var = Round(fs/(linelength+2),#PB_Round_Up)
	EndIf
EndMacro
FixedLengthNOL(filename$,10,number_of_lines)

As for getting the line count when you DON'T know how long each line will be... I've written this but it's slower than using While/ReadString/Wend:

Code: Select all

Procedure.i NOL(fn.s)
	
	f = ReadFile(#PB_Any,fn)
	If f
		loaf = Lof(f)
		If loaf<1
			CloseFile(f)
			ProcedureReturn 0
		EndIf
		
		*mem = AllocateMemory(loaf)
		ReadData(f,*mem,loaf)
		CloseFile(f)
		For a = 1 To loaf
			this.i = PeekA(*mem+a-1)
			If this = 13
				lines+1
			EndIf
		Next a
	EndIf
	ProcedureReturn lines
	
EndProcedure
On my machine, running that 1000 times takes 703ms, whereas the ReadString method only takes 141ms. Are there any faster ways?

Again, I know the FixedLengthNOL procedure may be obvious, but no doubt somebody, somewhere, will use it. :)
JACK WEBB: "Coding in C is like sculpting a statue using only sandpaper. You can do it, but the result wouldn't be any better. So why bother? Just use the right tools and get the job done."
IdeasVacuum
Always Here
Always Here
Posts: 6426
Joined: Fri Oct 23, 2009 2:33 am
Location: Wales, UK
Contact:

Re: NOL

Post by IdeasVacuum »

I think a lot depends on the file size and line length anyway, but we can't compare speeds unless we all test the same files.

I thought a variation of this would be OK:

Code: Select all

#File = 1

Procedure CountLines(sFileName.s)
;--------------------------------
iFileLength.i = FileSize(sFileName)

*MemBuff = AllocateMemory(iFileLength)

If OpenFile(#File,sFileName)

      ReadData(#File, *MemBuff, iFileLength)

      sWholeFile.s = PeekS(*MemBuff, iFileLength, #PB_Ascii)

            iPosn.i = 1
             iCnt.i = 0

      While iPosn > 0

            iPosn = FindString(sWholeFile, #CRLF$, iPosn)
            If iPosn > 0 : iPosn + 1 : iCnt + 1 : EndIf
      Wend

      Debug iCnt

EndIf

CloseFile(#File)

EndProcedure

CountLines("C:\MyFile.txt")

End
IdeasVacuum
If it sounds simple, you have not grasped the complexity.
User avatar
JackWebb
Enthusiast
Enthusiast
Posts: 109
Joined: Wed Dec 16, 2009 1:42 pm
Location: Tampa Florida

Re: NOL

Post by JackWebb »

For a file with records of unknown length (sequential access file) I don't think there is any easier or faster way (well maybe a little faster) than plain old ReadString in a loop, as you pointed out.

Code: Select all

OpenFile(#TestData, "Test.txt")

Repeat
  Junk$ = ReadString(#TestData, #PB_Ascii)
  TotalRecords + 1
Until Eof(#TestData)
CloseFile(#TestData)

Debug TotalRecords
For fixed length records (random access file) this is how I like to do it.

Code: Select all

EnableExplicit

Declare GetRecord(FileName$, RecNum)
Declare PutRecord(FileName$, RecNum)
Declare GetLastRecord(FileName$)

Define i.i, FileName$ = "c:\UserData.txt"

Structure UserType
  Name.s{30}
  RecNum.i
EndStructure : Global UserData.UserType

CreateFile(0, FileName$)
CloseFile(0)

UserData\Name = "Jack Webb"
For i = 1 To 100
  UserData\RecNum = i
  PutRecord(FileName$, i)
Next

GetRecord(FileName$, 55) ; get record 55

Debug UserData\Name
Debug "Record Number = " + Str(UserData\RecNum)
Debug "Total Records = " + Str(GetLastRecord(FileName$))

Procedure GetRecord(FileName$, RecNum)
  Define FreeFile

  FreeFile = OpenFile(#PB_Any, FileName$)
  If FreeFile
    FileSeek(FreeFile, (RecNum - 1) * SizeOf(UserData))
    ReadData(FreeFile, @UserData, SizeOf(UserData)) 
    CloseFile(FreeFile) 
  EndIf
EndProcedure

Procedure PutRecord(FileName$, RecNum)
  Define FreeFile

  FreeFile = OpenFile(#PB_Any, FileName$)
  If FreeFile
    FileSeek(FreeFile, (RecNum - 1) * SizeOf(UserData))
    WriteData(FreeFile, @UserData, SizeOf(UserData)) 
    CloseFile(FreeFile) 
  EndIf
EndProcedure

Procedure GetLastRecord(FileName$)
  Define FreeFile
  Define LastRec

  FreeFile = OpenFile(#PB_Any, FileName$)
  If FreeFile
    LastRec = Lof(FreeFile) / SizeOf(UserData)
    CloseFile(FreeFile)
  EndIf

  ProcedureReturn LastRec
EndProcedure
Make everything as simple as possible, but not simpler. ~Albert Einstein
User avatar
ts-soft
Always Here
Always Here
Posts: 5756
Joined: Thu Jun 24, 2004 2:44 pm
Location: Berlin - Germany

Re: NOL

Post by ts-soft »

@IdeasVacuum

Why not using CountString()?
I think it's faster as FindString in a loop.

Greetings - Thomas
PureBasic 5.73 | SpiderBasic 2.30 | Windows 10 Pro (x64) | Linux Mint 20.1 (x64)
Old bugs good, new bugs bad! Updates are evil: might fix old bugs and introduce no new ones.
Image
infratec
Always Here
Always Here
Posts: 7588
Joined: Sun Sep 07, 2008 12:45 pm
Location: Germany

Re: NOL

Post by infratec »

Hi,

my version:

Code: Select all

Procedure CountLinesBernd(sFileName.s)
  
  FileLength.q = FileSize(sFileName)
  *MemBuff = AllocateMemory(FileLength)
  If *MemBuff
    File = OpenFile(#PB_Any, sFileName)
    If File
      ReadData(File, *MemBuff, FileLength)
      iCnt.i = 0
      For i = 0 To FileLength
        If PeekA(*MemBuff + i) = $0D
          iCnt + 1
        EndIf
      Next i
      CloseFile(File)
    EndIf
    FreeMemory(*MemBuff)
  EndIf
  
  ProcedureReturn iCnt
EndProcedure

Starttime = ElapsedMilliseconds()
Test = CountLinesBernd("C:\rhide.txt")
Endtime = ElapsedMilliseconds()
MessageRequester("Time", "Lines: " + Str(Test) + " in " + Str(Endtime - Starttime))
Bernd
User avatar
ts-soft
Always Here
Always Here
Posts: 5756
Joined: Thu Jun 24, 2004 2:44 pm
Location: Berlin - Germany

Re: NOL

Post by ts-soft »

@infratec

Never OpenFile if not required!
FileSize is not required, if you read/open file:

Code: Select all

#File = 0
Procedure CountLinesBKK(sFileName.s)
 
;   FileLength.q = FileSize(sFileName)
;   *MemBuff = AllocateMemory(FileLength)
;   If *MemBuff
  If ReadFile(#File,sFileName)
    FileLength.q = Lof(#File)
    *MemBuff = AllocateMemory(FileLength)
    If *MemBuff
      ReadData(#File, *MemBuff, FileLength)
      iCnt.i = 0
      For i = 0 To FileLength
        If PeekA(*MemBuff + i) = $0D
          iCnt + 1
        EndIf
      Next i
      CloseFile(#File)
    EndIf
  FreeMemory(*MemBuff)
  EndIf

  ProcedureReturn iCnt
EndProcedure

Starttime = ElapsedMilliseconds()
Test = CountLinesBKK("C:\rhide.txt")
MessageRequester("Time", "Lines: " + Str(Test) + " in " + Str(ElapsedMilliseconds() - Starttime))
I don't think, you can detect a difference but is faster, but with CountString is it very faster :wink:

Greetings - Thomas
PureBasic 5.73 | SpiderBasic 2.30 | Windows 10 Pro (x64) | Linux Mint 20.1 (x64)
Old bugs good, new bugs bad! Updates are evil: might fix old bugs and introduce no new ones.
Image
User avatar
ts-soft
Always Here
Always Here
Posts: 5756
Joined: Thu Jun 24, 2004 2:44 pm
Location: Berlin - Germany

Re: NOL

Post by ts-soft »

My Version :wink:

Code: Select all

EnableExplicit

Procedure CountLinesFast(sFileName.s)
  Protected FF, size.q, *mem, result
  FF = ReadFile(#PB_Any, sFileName)
  If FF
    size = Lof(FF)
    *mem = AllocateMemory(size)
    If *mem
      ReadData(FF, *mem, size)
      result = CountString(PeekS(*mem, size), #CR$) + 1
      FreeMemory(*mem)
    EndIf
    CloseFile(FF)
  EndIf
  ProcedureReturn result
EndProcedure

Define Starttime = ElapsedMilliseconds()
Define test = CountLinesFast(#PB_Compiler_Home + "SDK\Readme.txt")
MessageRequester("Time", "Lines: " + Str(Test) + " in " + Str(ElapsedMilliseconds() - Starttime))
PureBasic 5.73 | SpiderBasic 2.30 | Windows 10 Pro (x64) | Linux Mint 20.1 (x64)
Old bugs good, new bugs bad! Updates are evil: might fix old bugs and introduce no new ones.
Image
infratec
Always Here
Always Here
Posts: 7588
Joined: Sun Sep 07, 2008 12:45 pm
Location: Germany

Re: NOL

Post by infratec »

Hi Thomas,

your version needs double of memory :mrgreen:
and the time is the same.
(my file has 4457 lines and 182422 bytes)

I thought that FileSize() does not open the file.
I thought Filesize looks in the filesystem table.

And it makes no difference in time :D

Here is a modified version which is capable to handle the bigest filesize
(if you have enogh memory)
'For' can not handle quads :!:

Code: Select all

Procedure CountLinesBernd(sFileName.s)
  
  iCnt.i = -1
  FileLength.q = FileSize(sFileName)
  If FileLength
    *MemBuff = AllocateMemory(FileLength)
    If *MemBuff
      File = OpenFile(#PB_Any, sFileName)
      If File
        ReadData(File, *MemBuff, FileLength)
        iCnt = 0
        i.q = 0
        Repeat
          If PeekA(*MemBuff + i) = $0D
            iCnt + 1
          EndIf
          i + 1
        Until i = FileLength
        CloseFile(File)
      EndIf
      FreeMemory(*MemBuff)
    EndIf
  EndIf
  
  ProcedureReturn iCnt
EndProcedure

Starttime = ElapsedMilliseconds()
Test = CountLinesBernd("C:\rhide.txt")
Endtime = ElapsedMilliseconds()
MessageRequester("Time", "Lines: " + Str(Test) + " in " + Str(Endtime - Starttime))
Bernd
User avatar
ts-soft
Always Here
Always Here
Posts: 5756
Joined: Thu Jun 24, 2004 2:44 pm
Location: Berlin - Germany

Re: NOL

Post by ts-soft »

You are right, no difference.

Textfile with 1048726 bytes.
Both version parst in 0 ms :lol:
PureBasic 5.73 | SpiderBasic 2.30 | Windows 10 Pro (x64) | Linux Mint 20.1 (x64)
Old bugs good, new bugs bad! Updates are evil: might fix old bugs and introduce no new ones.
Image
infratec
Always Here
Always Here
Posts: 7588
Joined: Sun Sep 07, 2008 12:45 pm
Location: Germany

Re: NOL

Post by infratec »

Hi Thomas,

your version is faster :!: (twice :( )
If you run it in a loop with a count up to 100 or higher you will see the difference.

So I found a trick to combine your speed with my less memory usage:

Code: Select all

Procedure CountLinesFast(sFileName.s)
  Protected FF, size.q, result, String.s
  result = -1
  FF = ReadFile(#PB_Any, sFileName)
  If FF
    size = Lof(FF)
    String = Space(size)
    ReadData(FF, @String, size)
    result = CountString(String, #CR$) + 1
    CloseFile(FF)
  EndIf
  ProcedureReturn result
EndProcedure
So it's the best of both :mrgreen: :mrgreen: :mrgreen:

Bernd
IdeasVacuum
Always Here
Always Here
Posts: 6426
Joined: Fri Oct 23, 2009 2:33 am
Location: Wales, UK
Contact:

Re: NOL

Post by IdeasVacuum »

ts-soft wrote:
Why not using CountString()?
I think it's faster as FindString in a loop.

Greetings - Thomas
..... I didn't know CountString() existed :mrgreen: :D
IdeasVacuum
If it sounds simple, you have not grasped the complexity.
IdeasVacuum
Always Here
Always Here
Posts: 6426
Joined: Fri Oct 23, 2009 2:33 am
Location: Wales, UK
Contact:

Re: NOL

Post by IdeasVacuum »

....that is very fast Bernd. Tiny issue, it seems to report one too many lines.
IdeasVacuum
If it sounds simple, you have not grasped the complexity.
infratec
Always Here
Always Here
Posts: 7588
Joined: Sun Sep 07, 2008 12:45 pm
Location: Germany

Re: NOL

Post by infratec »

Hi,

I don't think that it is one to high.
First I made the same mistake:
If you have one line with a CR at the end, you have 2 lines :!:
The first one and now a second empty one.

To check it use, for example pspad (my favourite editor) and open the text file with it.
At the bottom you can see the number of lines.
It is identical to my procedure. :D

Bernd
IdeasVacuum
Always Here
Always Here
Posts: 6426
Joined: Fri Oct 23, 2009 2:33 am
Location: Wales, UK
Contact:

Re: NOL

Post by IdeasVacuum »

....hmm, you would rarely want to read an empty last line. However, your code is flexible from that standpoint because the '+1' can be omitted.
IdeasVacuum
If it sounds simple, you have not grasped the complexity.
infratec
Always Here
Always Here
Posts: 7588
Joined: Sun Sep 07, 2008 12:45 pm
Location: Germany

Re: NOL

Post by infratec »

Hi,

if I follow your Idea Vacuum ( :) ) than open notepad, type in IdeasVacuum,
don't press return and save this file.

You have now a text file without CR at the end and you tell us that this file has no line.
(It's vacuum :) )

A line without everything is a new line ( \n in C )

So the + 1 is right.

Best regards

Bernd

P.S.: Don't be angry about the small jokes in the text.
It's part of my humor.
So I say better 'excuse me' in case if you misinterprete the text.
Post Reply