Counting files in directory

Just starting out? Need help? Post your questions and find answers here.
User avatar
Erich
User
User
Posts: 49
Joined: Thu Sep 30, 2010 9:21 pm

Counting files in directory

Post by Erich »

Hi folks,

I'm looking for a fast & cross-platform way to count all files - or, alternatively, all files matching a wildcard string - in a directory. I know how to loop through the files with ExamineDirectory, of course, but was wondering whether there is a faster way. Surely, the number of files is stored by the operating system already? But I haven't found any PB function for this.

Thanks for any help, and happy programming to all of you! :D
"I have never let my schooling interfere with my education." - Mark Twain
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3942
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: Counting files in directory

Post by wilbert »

I think ExamineDirectory is the best cross platform way to do so.
You can compare with OS specific API functions if you wish and use those if they are faster.
Windows (x64)
Raspberry Pi OS (Arm64)
User avatar
Erich
User
User
Posts: 49
Joined: Thu Sep 30, 2010 9:21 pm

Re: Counting files in directory

Post by Erich »

Hm, that's what I was suspecting. :( Would be a useful built-in function, because AFAIK all filesystems keep track of the number of directory entries anyway. The problem is, I need the file count in advance for correctly reporting the progress when I process all the files, and there may be many files -- millions, for example.

But I'll probably stick to the counting method rather than messing around with system calls. Of course, I could just report the progress in a less specific way... :)
"I have never let my schooling interfere with my education." - Mark Twain
Marc56us
Addict
Addict
Posts: 1600
Joined: Sat Feb 08, 2014 3:26 pm

Re: Counting files in directory

Post by Marc56us »

Erich wrote:I'm looking for a fast & cross-platform way to count all files - or, alternatively, all files matching a wildcard string - in a directory. I know how to loop through the files with ExamineDirectory, of course, but was wondering whether there is a faster way. Surely, the number of files is stored by the operating system already?
To my knowledge, no operating system will stock the number of files in the filesystem.
And whatever the algorithm will always be faster than the hard drive

:wink:
firace
Addict
Addict
Posts: 946
Joined: Wed Nov 09, 2011 8:58 am

Re: Counting files in directory

Post by firace »

Dude
Addict
Addict
Posts: 1907
Joined: Mon Feb 16, 2015 2:49 pm

Re: Counting files in directory

Post by Dude »

Erich wrote:I'm looking for a fast & cross-platform way to count all files - or, alternatively, all files matching a wildcard string - in a directory.
You didn't mention if files in subfolders of folders should be included. ;)

Also, how fast is fast? On my old slow SATA HD, ExamineDirectory() parses my Windows folder (including its subfolders) of 66,000 files in 4 seconds. I think that's acceptable, but maybe it's too slow for you? On a modern SSD, that might only be 1 second.

And I don't think any OS keeps a count of files in folders... never seen anything like that, ever.
User avatar
Erich
User
User
Posts: 49
Joined: Thu Sep 30, 2010 9:21 pm

Re: Counting files in directory

Post by Erich »

Thanks a lot, folks! I don't know where I got the idea from that there is an API for this on every platform, by now I believe that if there is a cached file count it is probably very filesystem-dependent and not easily accessible.

Here is what I'm using now (partly based on code by someone else, but I no longer know whom):

Code: Select all

DeclareModule FileHelpers
  CompilerIf #PB_Compiler_OS = #PB_OS_Windows
    #Path_Separator = "\"
  CompilerElse
    #Path_Separator = "/"
  CompilerEndIf
  
  Declare CountFiles(folder.s, pattern.s="*.*", recursive=#True, countOnlyVisible=#False)
  Declare IsEmptyDirectory(folder.s)
EndDeclareModule

Module FileHelpers
  Procedure CountFiles(folder.s, pattern.s="*.*", recursive=#True, countOnlyVisible=#False)
    Protected count=0
    Protected NewList ToDo.s(), hd
    
    If Right(folder, 1) <> #Path_Separator : folder + #Path_Separator : EndIf
    
    AddElement(ToDo())
    ToDo() = folder
    ResetList(ToDo())
    
    While NextElement(ToDo())
      folder = ToDo()
      DeleteElement(ToDo())
      hd = ExamineDirectory(#PB_Any, folder, pattern)
      If hd
        While NextDirectoryEntry(hd)   
          If DirectoryEntryType(hd) = #PB_DirectoryEntry_File
            If countOnlyVisible
              If GetFilePart(DirectoryEntryName(hd),#PB_FileSystem_NoExtension)<>""
                count+1
              EndIf
            Else
              count+1
            EndIf
          Else
            If DirectoryEntryName(hd) <> "." And DirectoryEntryName(hd) <> ".." And recursive
              AddElement(ToDo())
              ToDo() = folder + DirectoryEntryName(hd) + #Path_Separator
            EndIf
          EndIf
        Wend
        FinishDirectory(hd)
      EndIf  
      ResetList(ToDo())
    Wend
    ClearList(ToDo())
    ProcedureReturn count
  EndProcedure
  
  Procedure IsEmptyDirectory(folder.s)
    Protected result=#False
    If CountFiles(folder)=0
      result=#True
    EndIf
    ProcedureReturn result
  EndProcedure
EndModule
It's reasonably fast, though I'm still looking for ways to get the count in constant time.

Edit: Corrected error with countOnlyVisible.
Last edited by Erich on Sat Jan 02, 2016 1:55 pm, edited 1 time in total.
"I have never let my schooling interfere with my education." - Mark Twain
User avatar
blueb
Addict
Addict
Posts: 1111
Joined: Sat Apr 26, 2003 2:15 pm
Location: Cuernavaca, Mexico

Re: Counting files in directory

Post by blueb »

This may help...

Code: Select all

;==================================================================
;
; Author:     Rings    
; Date:       February 21st, 2010
; Explain:
;           
; See:  http://www.purebasic.fr/english/viewtopic.php?p=316623#p316623        
;==================================================================

StartTime = ElapsedMilliseconds() 

Global AZ.q
Procedure.q CountFiles(directory.s , directoryid.l )
 
  If Right(directory,1)<>"\":    directory+"\":  EndIf
  If ExamineDirectory(directoryid,directory,"*.*")
    dirid=NextDirectoryEntry(directoryid)
    While dirid
      dirtype = DirectoryEntryType(directoryid)
      Select dirtype
        Case #PB_DirectoryEntry_File
          file.s=directory + DirectoryEntryName(directoryid)
          FS.q=FileSize(File)
          FZ.q + FS
          AZ +1
          ;Debug Str(FS)+" " +  File
        Case #PB_DirectoryEntry_Directory
          If DirectoryEntryName(directoryid)<>"." And DirectoryEntryName(directoryid)<>".."     
            ;Debug "examine DIR " + DirectoryEntryName(directoryid)
            FZ  + CountFiles(directory+DirectoryEntryName(directoryid)+"\",directoryid+1)     
          EndIf
      EndSelect
      dirid=NextDirectoryEntry(directoryid)
    Wend
  EndIf
  ProcedureReturn FZ
EndProcedure

AZ=0
SZ.q=CountFiles("c:\PureBasic",1)
ElapsedTime = ElapsedMilliseconds()-StartTime 

Debug "Found a total of " + Str(az) + " files having a total size of = "+ Str(SZ) + " bytes"

Debug "And completed in: " + Str(ElapsedTime) + " milliseconds"
With an SSD on Win10, my results were:

Code: Select all

Found a total of 39176 files having a total size of = 2890776938 bytes
And completed in: 2544 milliseconds
- It was too lonely at the top.

System : PB 6.21(x64) and Win 11 Pro (x64)
Hardware: AMD Ryzen 9 5900X w/64 gigs Ram, AMD RX 6950 XT Graphics w/16gigs Mem
User avatar
Erich
User
User
Posts: 49
Joined: Thu Sep 30, 2010 9:21 pm

Re: Counting files in directory

Post by Erich »

This may help...
Your directory traversal may be slightly faster but it will run out of stack space when there are many nested subdirectories.
"I have never let my schooling interfere with my education." - Mark Twain
davido
Addict
Addict
Posts: 1890
Joined: Fri Nov 09, 2012 11:04 pm
Location: Uttoxeter, UK

Re: Counting files in directory

Post by davido »

@Erich,
Thank you for sharing your code.
Looks great to me. :D
DE AA EB
HanPBF
Enthusiast
Enthusiast
Posts: 570
Joined: Fri Feb 19, 2010 3:42 am

Re: Counting files in directory

Post by HanPBF »

Hello,

first of all: sorry for not giving any code...

I have only a hint to the NTFS master table.
http://www.voidtools.com has the tool "everything".
It reads the data from the NTFS master table into a database.
That shall be the fastest... for windows.

I guess, for Linux and Mac OSX, other master like tables are available.
User avatar
blueb
Addict
Addict
Posts: 1111
Joined: Sat Apr 26, 2003 2:15 pm
Location: Cuernavaca, Mexico

Re: Counting files in directory

Post by blueb »

Erich wrote:
This may help...
Your directory traversal may be slightly faster but it will run out of stack space when there are many nested subdirectories.
Ran the code again on my USB external hard-drive:

Code: Select all

Found a total of 734,158 files having a total size of = 547,368,403,841 bytes
And completed in: 157,926 milliseconds (NOTE: non-SSD)

You might be correct, but I didn't any limits and there are plenty of folders on this disk. :)
- It was too lonely at the top.

System : PB 6.21(x64) and Win 11 Pro (x64)
Hardware: AMD Ryzen 9 5900X w/64 gigs Ram, AMD RX 6950 XT Graphics w/16gigs Mem
User avatar
Erich
User
User
Posts: 49
Joined: Thu Sep 30, 2010 9:21 pm

Re: Counting files in directory

Post by Erich »

You might be correct, but I didn't any limits and there are plenty of folders on this disk. :)
Yes, the stack size is fairly large on most systems nowadays. It should be alright as long as you're aware of the limitation. Personally, I avoid recursion in languages with stack limit just to remain on the safe side.
"I have never let my schooling interfere with my education." - Mark Twain
User avatar
blueb
Addict
Addict
Posts: 1111
Joined: Sat Apr 26, 2003 2:15 pm
Location: Cuernavaca, Mexico

Re: Counting files in directory

Post by blueb »

Erich wrote:
You might be correct, but I didn't any limits and there are plenty of folders on this disk. :)
Yes, the stack size is fairly large on most systems nowadays. It should be alright as long as you're aware of the limitation. Personally, I avoid recursion in languages with stack limit just to remain on the safe side.
Perhaps this code would help in the future:
http://www.purebasic.fr/english/viewtop ... c&start=15

MrMat:
The example initially sets the stack to 200 kb then calls an iterative procedure which monitors the free stack space. When it goes below 100 kb it calls a procedure to increase the stack by 1 mb and then it continues iterating. I set it to stop when the stack size reached 2 mb. The decreasing numbers at the end show the stack was preserved correctly so i think it's all working.
I'm looking into this code... looks like it might be useful.
- It was too lonely at the top.

System : PB 6.21(x64) and Win 11 Pro (x64)
Hardware: AMD Ryzen 9 5900X w/64 gigs Ram, AMD RX 6950 XT Graphics w/16gigs Mem
Post Reply