Page 1 of 1

Wrapper for use 32bit string search algorithms in 4gb+ files

Posted: Fri Aug 28, 2015 8:39 am
by Keya
Today after a mocha coffee i realized that in order to search a file larger than 4gb we need to use a wrapper around the search algorithms, most of which only support max 4gb buffers. And besides we dont want to load a 4gb buffer :) :) :)
Here is my effort. For this of course we need a string search algorithm, we're free to use pretty much any as long as we tweak the parameters!
For this demo im using wilbert's kickass FindData module, which actually is for both 32bit and 64bit!
but the purpose of this demo is to allow to use any 32bit search algorithm as most are (and if its 64bit thats fine too, as we still don't want to read 4gb at a time anyway!)
There are no doubt more efficient ways but this works ok :)

Code: Select all

XIncludeFile("WilbertsFindDataModule.pbi")   ;http://www.purebasic.fr/english/viewtopic.php?p=467148
 
Procedure.q SearchFile(SearchFileName.s, *SearchNeedleData, SearchNeedleSize.i)
  Protected *pMem, baseposition.q, hFile.i
  Protected bytesread.l, findoffset.i      ;these may need to be .l or .i/q, depending on the actual search algorithm parameters
  #BUFSIZE = 1024*1024*4     ;4mb buffer.  Bufsize must always be > any possible SearchNeedleSize
  result.q = -1
  *pMem = AllocateMemory(#BUFSIZE) 
  If *pMem = 0
    MessageRequester("Error","Memory allocation failed")
    ProcedureReturn -2    ;-1 = needle not found, -2 = memory error
  EndIf
  hFile = ReadFile(#PB_Any, SearchFileName)
  If hFile <> 0
    Repeat ;Read #BUFSIZE chunks until EOF
      FileSeek(hFile, baseposition.q, #PB_Absolute)      
      bytesread = ReadData(hFile, *pMem, #BUFSIZE)
      If bytesread = 0: Break: EndIf
      findoffset = FindData::BM(*pMem, bytesread, *SearchNeedleData, SearchNeedleSize)
      If findoffset => 0  ;Did we find search needle?
        result.q = findoffset + baseposition
        Break          
      EndIf    
      baseposition + #BUFSIZE - SearchNeedleSize   ;we increase not by bufsize, but by bufsize less needlesize to ensure we dont skip over part of it
    Until Eof(hFile)
    CloseFile(hFile)
  EndIf
  FreeMemory(*pMem)
  ProcedureReturn result
EndProcedure


FileToSearch.s = ProgramFilename()    ;search our own .exe for this demo... guaranteed to exist!
SearchNeedle.s = "Memory allocation"  ;string to find... also guaranteed to exist!
MessageRequester("Result","Result=" + Str(SearchFile(FileToSearch, @SearchNeedle, Len(SearchNeedle))))