Page 1 of 1
Counting files in directory
Posted: Thu Dec 31, 2015 3:23 pm
by Erich
Hi folks,
I'm looking for a fast & cross-platform way to count all files - or, alternatively, all files matching a wildcard string - in a directory. I know how to loop through the files with ExamineDirectory, of course, but was wondering whether there is a faster way. Surely, the number of files is stored by the operating system already? But I haven't found any PB function for this.
Thanks for any help, and happy programming to all of you!

Re: Counting files in directory
Posted: Thu Dec 31, 2015 4:23 pm
by wilbert
I think ExamineDirectory is the best cross platform way to do so.
You can compare with OS specific API functions if you wish and use those if they are faster.
Re: Counting files in directory
Posted: Thu Dec 31, 2015 4:38 pm
by Erich
Hm, that's what I was suspecting.

Would be a useful built-in function, because AFAIK all filesystems keep track of the number of directory entries anyway. The problem is, I need the file count in advance for correctly reporting the progress when I process all the files, and there may be many files -- millions, for example.
But I'll probably stick to the counting method rather than messing around with system calls. Of course, I could just report the progress in a less specific way...

Re: Counting files in directory
Posted: Thu Dec 31, 2015 4:40 pm
by Marc56us
Erich wrote:I'm looking for a fast & cross-platform way to count all files - or, alternatively, all files matching a wildcard string - in a directory. I know how to loop through the files with ExamineDirectory, of course, but was wondering whether there is a faster way. Surely, the number of files is stored by the operating system already?
To my knowledge, no operating system will stock the number of files in the filesystem.
And whatever the algorithm will always be faster than the hard drive

Re: Counting files in directory
Posted: Thu Dec 31, 2015 7:05 pm
by firace
Re: Counting files in directory
Posted: Fri Jan 01, 2016 12:53 am
by Dude
Erich wrote:I'm looking for a fast & cross-platform way to count all files - or, alternatively, all files matching a wildcard string - in a directory.
You didn't mention if files in subfolders of folders should be included.
Also, how fast is fast? On my old slow SATA HD, ExamineDirectory() parses my Windows folder (including its subfolders) of 66,000 files in 4 seconds. I think that's acceptable, but maybe it's too slow for you? On a modern SSD, that might only be 1 second.
And I don't think any OS keeps a count of files in folders... never seen anything like that, ever.
Re: Counting files in directory
Posted: Fri Jan 01, 2016 1:13 pm
by Erich
Thanks a lot, folks! I don't know where I got the idea from that there is an API for this on every platform, by now I believe that if there is a cached file count it is probably very filesystem-dependent and not easily accessible.
Here is what I'm using now (partly based on code by someone else, but I no longer know whom):
Code: Select all
DeclareModule FileHelpers
CompilerIf #PB_Compiler_OS = #PB_OS_Windows
#Path_Separator = "\"
CompilerElse
#Path_Separator = "/"
CompilerEndIf
Declare CountFiles(folder.s, pattern.s="*.*", recursive=#True, countOnlyVisible=#False)
Declare IsEmptyDirectory(folder.s)
EndDeclareModule
Module FileHelpers
Procedure CountFiles(folder.s, pattern.s="*.*", recursive=#True, countOnlyVisible=#False)
Protected count=0
Protected NewList ToDo.s(), hd
If Right(folder, 1) <> #Path_Separator : folder + #Path_Separator : EndIf
AddElement(ToDo())
ToDo() = folder
ResetList(ToDo())
While NextElement(ToDo())
folder = ToDo()
DeleteElement(ToDo())
hd = ExamineDirectory(#PB_Any, folder, pattern)
If hd
While NextDirectoryEntry(hd)
If DirectoryEntryType(hd) = #PB_DirectoryEntry_File
If countOnlyVisible
If GetFilePart(DirectoryEntryName(hd),#PB_FileSystem_NoExtension)<>""
count+1
EndIf
Else
count+1
EndIf
Else
If DirectoryEntryName(hd) <> "." And DirectoryEntryName(hd) <> ".." And recursive
AddElement(ToDo())
ToDo() = folder + DirectoryEntryName(hd) + #Path_Separator
EndIf
EndIf
Wend
FinishDirectory(hd)
EndIf
ResetList(ToDo())
Wend
ClearList(ToDo())
ProcedureReturn count
EndProcedure
Procedure IsEmptyDirectory(folder.s)
Protected result=#False
If CountFiles(folder)=0
result=#True
EndIf
ProcedureReturn result
EndProcedure
EndModule
It's reasonably fast, though I'm still looking for ways to get the count in constant time.
Edit: Corrected error with countOnlyVisible.
Re: Counting files in directory
Posted: Fri Jan 01, 2016 2:09 pm
by blueb
This may help...
Code: Select all
;==================================================================
;
; Author: Rings
; Date: February 21st, 2010
; Explain:
;
; See: http://www.purebasic.fr/english/viewtopic.php?p=316623#p316623
;==================================================================
StartTime = ElapsedMilliseconds()
Global AZ.q
Procedure.q CountFiles(directory.s , directoryid.l )
If Right(directory,1)<>"\": directory+"\": EndIf
If ExamineDirectory(directoryid,directory,"*.*")
dirid=NextDirectoryEntry(directoryid)
While dirid
dirtype = DirectoryEntryType(directoryid)
Select dirtype
Case #PB_DirectoryEntry_File
file.s=directory + DirectoryEntryName(directoryid)
FS.q=FileSize(File)
FZ.q + FS
AZ +1
;Debug Str(FS)+" " + File
Case #PB_DirectoryEntry_Directory
If DirectoryEntryName(directoryid)<>"." And DirectoryEntryName(directoryid)<>".."
;Debug "examine DIR " + DirectoryEntryName(directoryid)
FZ + CountFiles(directory+DirectoryEntryName(directoryid)+"\",directoryid+1)
EndIf
EndSelect
dirid=NextDirectoryEntry(directoryid)
Wend
EndIf
ProcedureReturn FZ
EndProcedure
AZ=0
SZ.q=CountFiles("c:\PureBasic",1)
ElapsedTime = ElapsedMilliseconds()-StartTime
Debug "Found a total of " + Str(az) + " files having a total size of = "+ Str(SZ) + " bytes"
Debug "And completed in: " + Str(ElapsedTime) + " milliseconds"
With an SSD on Win10, my results were:
Code: Select all
Found a total of 39176 files having a total size of = 2890776938 bytes
And completed in: 2544 milliseconds
Re: Counting files in directory
Posted: Fri Jan 01, 2016 2:24 pm
by Erich
This may help...
Your directory traversal may be slightly faster but it will run out of stack space when there are many nested subdirectories.
Re: Counting files in directory
Posted: Sat Jan 02, 2016 11:32 am
by davido
@
Erich,
Thank you for sharing your code.
Looks great to me.

Re: Counting files in directory
Posted: Sat Jan 02, 2016 1:05 pm
by HanPBF
Hello,
first of all: sorry for not giving any code...
I have only a hint to the NTFS master table.
http://www.voidtools.com has the tool "everything".
It reads the data from the NTFS master table into a database.
That shall be the fastest... for windows.
I guess, for Linux and Mac OSX, other master like tables are available.
Re: Counting files in directory
Posted: Sat Jan 02, 2016 1:52 pm
by blueb
Erich wrote:This may help...
Your directory traversal may be slightly faster but it will run out of stack space when there are many nested subdirectories.
Ran the code again on my USB external hard-drive:
Code: Select all
Found a total of 734,158 files having a total size of = 547,368,403,841 bytes
And completed in: 157,926 milliseconds (NOTE: non-SSD)
You might be correct, but I didn't any limits and there are plenty of folders on this disk.

Re: Counting files in directory
Posted: Sat Jan 02, 2016 2:01 pm
by Erich
You might be correct, but I didn't any limits and there are plenty of folders on this disk.

Yes, the stack size is fairly large on most systems nowadays. It should be alright as long as you're aware of the limitation. Personally, I avoid recursion in languages with stack limit just to remain on the safe side.
Re: Counting files in directory
Posted: Sat Jan 02, 2016 2:21 pm
by blueb
Erich wrote:You might be correct, but I didn't any limits and there are plenty of folders on this disk.

Yes, the stack size is fairly large on most systems nowadays. It should be alright as long as you're aware of the limitation. Personally, I avoid recursion in languages with stack limit just to remain on the safe side.
Perhaps this code would help in the future:
http://www.purebasic.fr/english/viewtop ... c&start=15
MrMat:
The example initially sets the stack to 200 kb then calls an iterative procedure which monitors the free stack space. When it goes below 100 kb it calls a procedure to increase the stack by 1 mb and then it continues iterating. I set it to stop when the stack size reached 2 mb. The decreasing numbers at the end show the stack was preserved correctly so i think it's all working.
I'm looking into this code... looks like it might be useful.