It is currently Wed May 22, 2013 5:39 pm

All times are UTC + 1 hour




Post new topic Reply to topic  [ 19 posts ]  Go to page 1, 2  Next
Author Message
 Post subject: NOL
PostPosted: Thu Jan 20, 2011 3:54 am 
Offline
Addict
Addict
User avatar

Joined: Wed Feb 28, 2007 9:13 am
Posts: 923
Location: Edinburgh
If you need the number of lines in a file, the only native way in PB (that I know of) is to go through it with ReadString until you reach the EOF, counting up one each time.

Code:
Procedure.f NOL(fn.s)
   f = ReadFile(#PB_Any,fn)
   If f
      While Not Eof(f)
         ReadString(f)
         n+1
      Wend
      CloseFile(f)
      ProcedureReturn n
   EndIf
   ProcedureReturn 0
EndProcedure
number_of_lines = NOL(filename$,10)


But if you know that every line in the file will be of a certain length, you can use this code which is MUCH faster:
Code:
Procedure.i FixedLengthNOL(fn.s,linelength)
   fs = FileSize(fn)
   If fs>0
      ProcedureReturn Round(fs/(linelength+2),#PB_Round_Up)
   EndIf
   ProcedureReturn 0
EndProcedure
number_of_lines = FixedLengthNOL(filename$,10)

I've tried it on many different files and it's counted correctly every time. I know that code like this may be obvious to many people - I'm hoping there are other PB users out there as gormless as me who will benefit from this post!

This macro version is even faster:
Code:
Macro FixedLengthNOL(fn,linelength)
   Round(FileSize(fn)/(linelength+2),#PB_Round_Up)
EndMacro
number_of_lines = FixedLengthNOL(filename$,10)

... but it will cause a crash if the FileSize is zero (division by zero forbidden), so you really have to use the procedure version.

You could use this next version, but it's not so easy to integrate into your code:
Code:
Macro FixedLengthNOL(fn,linelength,var)
   var = 0
   fs = FileSize(fn)
   If fs>0
      var = Round(fs/(linelength+2),#PB_Round_Up)
   EndIf
EndMacro
FixedLengthNOL(filename$,10,number_of_lines)



As for getting the line count when you DON'T know how long each line will be... I've written this but it's slower than using While/ReadString/Wend:
Code:
Procedure.i NOL(fn.s)
   
   f = ReadFile(#PB_Any,fn)
   If f
      loaf = Lof(f)
      If loaf<1
         CloseFile(f)
         ProcedureReturn 0
      EndIf
      
      *mem = AllocateMemory(loaf)
      ReadData(f,*mem,loaf)
      CloseFile(f)
      For a = 1 To loaf
         this.i = PeekA(*mem+a-1)
         If this = 13
            lines+1
         EndIf
      Next a
   EndIf
   ProcedureReturn lines
   
EndProcedure

On my machine, running that 1000 times takes 703ms, whereas the ReadString method only takes 141ms. Are there any faster ways?

Again, I know the FixedLengthNOL procedure may be obvious, but no doubt somebody, somewhere, will use it. :)

_________________
JACK WEBB: "Coding in C is like sculpting a statue using only sandpaper. You can do it, but the result wouldn't be any better. So why bother? Just use the right tools and get the job done."


Top
 Profile  
 
 Post subject: Re: NOL
PostPosted: Thu Jan 20, 2011 5:24 am 
Offline
Addict
Addict

Joined: Fri Oct 23, 2009 2:33 am
Posts: 2860
Location: Wales, UK
I think a lot depends on the file size and line length anyway, but we can't compare speeds unless we all test the same files.

I thought a variation of this would be OK:

Code:
#File = 1

Procedure CountLines(sFileName.s)
;--------------------------------
iFileLength.i = FileSize(sFileName)

*MemBuff = AllocateMemory(iFileLength)

If OpenFile(#File,sFileName)

      ReadData(#File, *MemBuff, iFileLength)

      sWholeFile.s = PeekS(*MemBuff, iFileLength, #PB_Ascii)

            iPosn.i = 1
             iCnt.i = 0

      While iPosn > 0

            iPosn = FindString(sWholeFile, #CRLF$, iPosn)
            If iPosn > 0 : iPosn + 1 : iCnt + 1 : EndIf
      Wend

      Debug iCnt

EndIf

CloseFile(#File)

EndProcedure

CountLines("C:\MyFile.txt")

End

_________________
IdeasVacuum
If it sounds simple, you have not grasped the complexity.


Top
 Profile  
 
 Post subject: Re: NOL
PostPosted: Thu Jan 20, 2011 5:52 am 
Offline
User
User
User avatar

Joined: Wed Dec 16, 2009 1:42 pm
Posts: 90
Location: Tampa Florida
For a file with records of unknown length (sequential access file) I don't think there is any easier or faster way (well maybe a little faster) than plain old ReadString in a loop, as you pointed out.

Code:
OpenFile(#TestData, "Test.txt")

Repeat
  Junk$ = ReadString(#TestData, #PB_Ascii)
  TotalRecords + 1
Until Eof(#TestData)
CloseFile(#TestData)

Debug TotalRecords


For fixed length records (random access file) this is how I like to do it.

Code:
EnableExplicit

Declare GetRecord(FileName$, RecNum)
Declare PutRecord(FileName$, RecNum)
Declare GetLastRecord(FileName$)

Define i.i, FileName$ = "c:\UserData.txt"

Structure UserType
  Name.s{30}
  RecNum.i
EndStructure : Global UserData.UserType

CreateFile(0, FileName$)
CloseFile(0)

UserData\Name = "Jack Webb"
For i = 1 To 100
  UserData\RecNum = i
  PutRecord(FileName$, i)
Next

GetRecord(FileName$, 55) ; get record 55

Debug UserData\Name
Debug "Record Number = " + Str(UserData\RecNum)
Debug "Total Records = " + Str(GetLastRecord(FileName$))

Procedure GetRecord(FileName$, RecNum)
  Define FreeFile

  FreeFile = OpenFile(#PB_Any, FileName$)
  If FreeFile
    FileSeek(FreeFile, (RecNum - 1) * SizeOf(UserData))
    ReadData(FreeFile, @UserData, SizeOf(UserData))
    CloseFile(FreeFile)
  EndIf
EndProcedure

Procedure PutRecord(FileName$, RecNum)
  Define FreeFile

  FreeFile = OpenFile(#PB_Any, FileName$)
  If FreeFile
    FileSeek(FreeFile, (RecNum - 1) * SizeOf(UserData))
    WriteData(FreeFile, @UserData, SizeOf(UserData))
    CloseFile(FreeFile)
  EndIf
EndProcedure

Procedure GetLastRecord(FileName$)
  Define FreeFile
  Define LastRec

  FreeFile = OpenFile(#PB_Any, FileName$)
  If FreeFile
    LastRec = Lof(FreeFile) / SizeOf(UserData)
    CloseFile(FreeFile)
  EndIf

  ProcedureReturn LastRec
EndProcedure

_________________
Make everything as simple as possible, but not simpler. ~Albert Einstein


Top
 Profile  
 
 Post subject: Re: NOL
PostPosted: Thu Jan 20, 2011 8:17 am 
Offline
Addict
Addict
User avatar

Joined: Thu Jun 24, 2004 2:44 pm
Posts: 4715
Location: Berlin - Germany
@IdeasVacuum

Why not using CountString()?
I think it's faster as FindString in a loop.

Greetings - Thomas

_________________
PureBasic 5.11 | Windows 7 SP1 (x64) | Linux Mint 14 (x64) | RealSource

The use of EnableExplicit is free of charge and avoids errors.


Top
 Profile  
 
 Post subject: Re: NOL
PostPosted: Thu Jan 20, 2011 10:36 am 
Offline
Addict
Addict

Joined: Sun Sep 07, 2008 12:45 pm
Posts: 1441
Location: Germany
Hi,

my version:
Code:
Procedure CountLinesBernd(sFileName.s)
 
  FileLength.q = FileSize(sFileName)
  *MemBuff = AllocateMemory(FileLength)
  If *MemBuff
    File = OpenFile(#PB_Any, sFileName)
    If File
      ReadData(File, *MemBuff, FileLength)
      iCnt.i = 0
      For i = 0 To FileLength
        If PeekA(*MemBuff + i) = $0D
          iCnt + 1
        EndIf
      Next i
      CloseFile(File)
    EndIf
    FreeMemory(*MemBuff)
  EndIf
 
  ProcedureReturn iCnt
EndProcedure

Starttime = ElapsedMilliseconds()
Test = CountLinesBernd("C:\rhide.txt")
Endtime = ElapsedMilliseconds()
MessageRequester("Time", "Lines: " + Str(Test) + " in " + Str(Endtime - Starttime))
Bernd


Top
 Profile  
 
 Post subject: Re: NOL
PostPosted: Thu Jan 20, 2011 10:46 am 
Offline
Addict
Addict
User avatar

Joined: Thu Jun 24, 2004 2:44 pm
Posts: 4715
Location: Berlin - Germany
@infratec

Never OpenFile if not required!
FileSize is not required, if you read/open file:
Code:
#File = 0
Procedure CountLinesBKK(sFileName.s)
 
;   FileLength.q = FileSize(sFileName)
;   *MemBuff = AllocateMemory(FileLength)
;   If *MemBuff
  If ReadFile(#File,sFileName)
    FileLength.q = Lof(#File)
    *MemBuff = AllocateMemory(FileLength)
    If *MemBuff
      ReadData(#File, *MemBuff, FileLength)
      iCnt.i = 0
      For i = 0 To FileLength
        If PeekA(*MemBuff + i) = $0D
          iCnt + 1
        EndIf
      Next i
      CloseFile(#File)
    EndIf
  FreeMemory(*MemBuff)
  EndIf

  ProcedureReturn iCnt
EndProcedure

Starttime = ElapsedMilliseconds()
Test = CountLinesBKK("C:\rhide.txt")
MessageRequester("Time", "Lines: " + Str(Test) + " in " + Str(ElapsedMilliseconds() - Starttime))

I don't think, you can detect a difference but is faster, but with CountString is it very faster :wink:

Greetings - Thomas

_________________
PureBasic 5.11 | Windows 7 SP1 (x64) | Linux Mint 14 (x64) | RealSource

The use of EnableExplicit is free of charge and avoids errors.


Top
 Profile  
 
 Post subject: Re: NOL
PostPosted: Thu Jan 20, 2011 10:57 am 
Offline
Addict
Addict
User avatar

Joined: Thu Jun 24, 2004 2:44 pm
Posts: 4715
Location: Berlin - Germany
My Version :wink:
Code:
EnableExplicit

Procedure CountLinesFast(sFileName.s)
  Protected FF, size.q, *mem, result
  FF = ReadFile(#PB_Any, sFileName)
  If FF
    size = Lof(FF)
    *mem = AllocateMemory(size)
    If *mem
      ReadData(FF, *mem, size)
      result = CountString(PeekS(*mem, size), #CR$) + 1
      FreeMemory(*mem)
    EndIf
    CloseFile(FF)
  EndIf
  ProcedureReturn result
EndProcedure

Define Starttime = ElapsedMilliseconds()
Define test = CountLinesFast(#PB_Compiler_Home + "SDK\Readme.txt")
MessageRequester("Time", "Lines: " + Str(Test) + " in " + Str(ElapsedMilliseconds() - Starttime))

_________________
PureBasic 5.11 | Windows 7 SP1 (x64) | Linux Mint 14 (x64) | RealSource

The use of EnableExplicit is free of charge and avoids errors.


Top
 Profile  
 
 Post subject: Re: NOL
PostPosted: Thu Jan 20, 2011 11:15 am 
Offline
Addict
Addict

Joined: Sun Sep 07, 2008 12:45 pm
Posts: 1441
Location: Germany
Hi Thomas,

your version needs double of memory :mrgreen:
and the time is the same.
(my file has 4457 lines and 182422 bytes)

I thought that FileSize() does not open the file.
I thought Filesize looks in the filesystem table.

And it makes no difference in time :D

Here is a modified version which is capable to handle the bigest filesize
(if you have enogh memory)
'For' can not handle quads :!:
Code:
Procedure CountLinesBernd(sFileName.s)
 
  iCnt.i = -1
  FileLength.q = FileSize(sFileName)
  If FileLength
    *MemBuff = AllocateMemory(FileLength)
    If *MemBuff
      File = OpenFile(#PB_Any, sFileName)
      If File
        ReadData(File, *MemBuff, FileLength)
        iCnt = 0
        i.q = 0
        Repeat
          If PeekA(*MemBuff + i) = $0D
            iCnt + 1
          EndIf
          i + 1
        Until i = FileLength
        CloseFile(File)
      EndIf
      FreeMemory(*MemBuff)
    EndIf
  EndIf
 
  ProcedureReturn iCnt
EndProcedure

Starttime = ElapsedMilliseconds()
Test = CountLinesBernd("C:\rhide.txt")
Endtime = ElapsedMilliseconds()
MessageRequester("Time", "Lines: " + Str(Test) + " in " + Str(Endtime - Starttime))


Bernd


Top
 Profile  
 
 Post subject: Re: NOL
PostPosted: Thu Jan 20, 2011 11:31 am 
Offline
Addict
Addict
User avatar

Joined: Thu Jun 24, 2004 2:44 pm
Posts: 4715
Location: Berlin - Germany
You are right, no difference.

Textfile with 1048726 bytes.
Both version parst in 0 ms :lol:

_________________
PureBasic 5.11 | Windows 7 SP1 (x64) | Linux Mint 14 (x64) | RealSource

The use of EnableExplicit is free of charge and avoids errors.


Top
 Profile  
 
 Post subject: Re: NOL
PostPosted: Thu Jan 20, 2011 8:36 pm 
Offline
Addict
Addict

Joined: Sun Sep 07, 2008 12:45 pm
Posts: 1441
Location: Germany
Hi Thomas,

your version is faster :!: (twice :( )
If you run it in a loop with a count up to 100 or higher you will see the difference.

So I found a trick to combine your speed with my less memory usage:
Code:
Procedure CountLinesFast(sFileName.s)
  Protected FF, size.q, result, String.s
  result = -1
  FF = ReadFile(#PB_Any, sFileName)
  If FF
    size = Lof(FF)
    String = Space(size)
    ReadData(FF, @String, size)
    result = CountString(String, #CR$) + 1
    CloseFile(FF)
  EndIf
  ProcedureReturn result
EndProcedure
So it's the best of both :mrgreen: :mrgreen: :mrgreen:

Bernd


Top
 Profile  
 
 Post subject: Re: NOL
PostPosted: Fri Jan 21, 2011 3:41 am 
Offline
Addict
Addict

Joined: Fri Oct 23, 2009 2:33 am
Posts: 2860
Location: Wales, UK
ts-soft wrote:

Why not using CountString()?
I think it's faster as FindString in a loop.

Greetings - Thomas


..... I didn't know CountString() existed :mrgreen: :D

_________________
IdeasVacuum
If it sounds simple, you have not grasped the complexity.


Top
 Profile  
 
 Post subject: Re: NOL
PostPosted: Fri Jan 21, 2011 3:47 am 
Offline
Addict
Addict

Joined: Fri Oct 23, 2009 2:33 am
Posts: 2860
Location: Wales, UK
....that is very fast Bernd. Tiny issue, it seems to report one too many lines.

_________________
IdeasVacuum
If it sounds simple, you have not grasped the complexity.


Top
 Profile  
 
 Post subject: Re: NOL
PostPosted: Fri Jan 21, 2011 8:50 am 
Offline
Addict
Addict

Joined: Sun Sep 07, 2008 12:45 pm
Posts: 1441
Location: Germany
Hi,

I don't think that it is one to high.
First I made the same mistake:
If you have one line with a CR at the end, you have 2 lines :!:
The first one and now a second empty one.

To check it use, for example pspad (my favourite editor) and open the text file with it.
At the bottom you can see the number of lines.
It is identical to my procedure. :D

Bernd


Top
 Profile  
 
 Post subject: Re: NOL
PostPosted: Fri Jan 21, 2011 4:33 pm 
Offline
Addict
Addict

Joined: Fri Oct 23, 2009 2:33 am
Posts: 2860
Location: Wales, UK
....hmm, you would rarely want to read an empty last line. However, your code is flexible from that standpoint because the '+1' can be omitted.

_________________
IdeasVacuum
If it sounds simple, you have not grasped the complexity.


Top
 Profile  
 
 Post subject: Re: NOL
PostPosted: Sat Jan 22, 2011 10:13 am 
Offline
Addict
Addict

Joined: Sun Sep 07, 2008 12:45 pm
Posts: 1441
Location: Germany
Hi,

if I follow your Idea Vacuum ( :) ) than open notepad, type in IdeasVacuum,
don't press return and save this file.

You have now a text file without CR at the end and you tell us that this file has no line.
(It's vacuum :) )

A line without everything is a new line ( \n in C )

So the + 1 is right.

Best regards

Bernd

P.S.: Don't be angry about the small jokes in the text.
It's part of my humor.
So I say better 'excuse me' in case if you misinterprete the text.


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 19 posts ]  Go to page 1, 2  Next

All times are UTC + 1 hour


Who is online

Users browsing this forum: No registered users and 3 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
Jump to:  

 


Powered by phpBB © 2008 phpBB Group
subSilver+ theme by Canver Software, sponsor Sanal Modifiye