Search/replacing data in a binary file?

Just starting out? Need help? Post your questions and find answers here.
cas
Enthusiast
Enthusiast
Posts: 597
Joined: Mon Nov 03, 2008 9:56 pm

Re: Search/replacing data in a binary file?

Post by cas »

This code could contain bugs. I made some basic tests and it looks fine. Please test it extensively before any serious use and post any bugs here so we can fix it if they exist. Hex search is default but you can pass string and set flag to #STRINGMODE to search for string in file.

I tested it with 1MB text file, length of string to find was 3 characters, made around 75000 replacements in file.
Result without buffered read (when: buffer length == length of data to find): 4165ms
Result with 4096 bytes buffer: 390ms :D
10 times faster :) It can be more optimized for larger strings when doing case-insensitive search with some modifications but i leave that to you :).
Here is the code, i added some unnecessary 'features', you can remove them if you don't like them, and i left some comments:

Code: Select all

EnableExplicit
DisableDebugger

#REPLACEONE     = 1 ;make only one replacement, default is to search whole file and make multiple replacements (without this flag)
#STRINGMODE     = 2 ;find string, default is hex (without this flag)
#CASESENSITIVE  = 4 ;case sensitive string search and replace, default is case-insensitive  (without this flag)
#FILLDATA       = 8 ;if DataToReplace$ is smaller than DataToFind$ then append data to make it same length as DataToFind$, function returns error (-1) without this flag if DataToFind$ and DataToReplace$ are not same length

#STRING_CASESENSITIVE=#STRINGMODE|#CASESENSITIVE|#FILLDATA

Procedure ReplaceDataInFile(File$, DataToFind$, DataToReplace$, StartPosition=1, Flags=#FILLDATA, BufferLen=4096,*Count=0)
  
  ;RETURN:  <0  if error {-4=file not found;-3=file open error;-2=not enough free memory on system;-1=wrong parameters}
  ;         =0  if ToFind$ not found
  ;         >0  position of last replacement
  
  Protected _ltf=Len(DataToFind$)
  Protected _ltr=Len(DataToReplace$)
  
  Protected length = _ltf
  Protected k, result
  
  If Not Flags & #STRINGMODE  ; if hex search...
    If (length%2)<>0          ; ...then len must be {2,4,6,8,...}
      ProcedureReturn -1    ; we don't have valid len for hex search (1,3,5,7,... is not valid)
    EndIf
    length/2                  ; 2 hex characters means 1 byte
  EndIf
  
  If Flags & #FILLDATA
    If _ltf>_ltr                        ;if length of DataToReplace$ is smaller than DataToFind$...
      DataToReplace$+Space(_ltf-_ltr)   ;...then add empty data (space characters: chr(20)) to end of DataToReplace$
      Protected __ltr=_ltr
      _ltr=Len(DataToReplace$)
      If Not Flags & #STRINGMODE        ;if we do not search for string (--> hex search)...
        For k=__ltr To _ltf-1
          PokeS(@DataToReplace$+k,"0")    ;...then convert spaces to zeros (or instead of "0" to "F")
        Next
      EndIf
    EndIf
  EndIf
  
  If length=0 Or StartPosition=<0 Or _ltf<>_ltr Or BufferLen=<0 Or BufferLen<length Or Flags<0
    ProcedureReturn -1
  EndIf
  
  Protected *searchdata         = AllocateMemory((length*2)+BufferLen)
  Protected *replacedata        = *searchdata+length
  Protected *testdata.Character = *replacedata+length
  
  If *searchdata=0
    ProcedureReturn -2
  EndIf
  
  If Flags & #STRINGMODE   ; if string search...
    If Not Flags & #CASESENSITIVE
      DataToFind$=UCase(DataToFind$)
    EndIf
    PokeS(*searchdata ,DataToFind$)
    PokeS(*replacedata,DataToReplace$)
  Else                    ; if hex search...
    For k=0 To length-1
      PokeB(*searchdata +k,Val("$"+PeekS(@DataToFind$   +(k*2),2)))
      PokeB(*replacedata+k,Val("$"+PeekS(@DataToReplace$+(k*2),2)))
    Next
  EndIf
  
  If FileSize(File$)>=0 ;OpenFile() creates file if it doesn't exist --> we don't want that so we check it before if it exists
    Protected file = OpenFile(#PB_Any, File$) ;if someone deletes file before this line is executed then we create empty file :( but probability of this is so small that we simply ignore it
    If file
      Protected readpos = StartPosition-1
      Protected thiseof = Lof(file) - length
      Protected replacecount = 0
      While (Not Eof(file)) And readpos <= thiseof
        FileSeek(file, readpos)
        
        Protected DataRead=ReadData(file, *testdata, BufferLen)
        If Flags & #STRINGMODE
          If Not Flags & #CASESENSITIVE
            For k=0 To DataRead-1   ;we can't convert it to uppercase at once because it could have chr(0) in the middle so we have this loop
              *testdata\c = Asc(UCase(Chr(*testdata\c)))  ;*testdata\c is a little faster than PeekC()
              *testdata+1 ;optionally: create structure with chr.c[0] and then use *testdata\chr[k] in previous line instead of this
            Next
            *testdata-DataRead    ;return pointer to first byte
          EndIf
        EndIf
        
        ;Debug "Buffered: "+PeekS(*testdata,DataRead)
        
        For k=0 To DataRead-length
          ;Debug "> Test: "+PeekS(*testdata+k,length) + " == " + PeekS(*searchdata,length)
          If CompareMemory(*testdata+k, *searchdata, length)
            FileSeek(file, readpos+k)
            ;Debug "Replacing data..."
            WriteData(file, *replacedata, length)
            result=readpos+k+1  ;save position
            replacecount+1
            If ((DataRead-length)-k)=<length
              readpos+(2*length+k-DataRead-1)
            EndIf
            k+(length-1)        ;skip comparing replaced characters in this loop
            If Flags & #REPLACEONE
              Break 2
            EndIf
          EndIf
        Next
        
        readpos + (DataRead - (length-1))
        
      Wend
      CloseFile(file)
    Else
      result=-3
    EndIf
  Else
    result=-4
  EndIf
  
  FreeMemory(*searchdata)
  
  If *Count<>0
    PokeI(*Count,replacecount)
  EndIf
  
  ProcedureReturn result
EndProcedure

;*************************************************************
;*************************************************************
;*************************************************************

;here is one way of many how to use it:
Define counter.i
Select ReplaceDataInFile("c:\file.txt", "Find This Text", "New text",1,#STRING_CASESENSITIVE,4*1024,@counter)
  Case -4,-3,-2,-1
    MessageRequester("Error","Procedure failed!",#MB_ICONERROR)
  Case 0
    MessageRequester("","No replacements made.")
  Default
    MessageRequester("Success","Number of replacements made: "+Str(counter.i))
EndSelect

;or simpler:
ReplaceDataInFile("c:\file.dat", "Find This Text", "New text",1,#STRING_CASESENSITIVE,4*1024,@counter)
EnableDebugger
If counter>0
  Debug "Success, replacements made: "+Str(counter)
EndIf

;or:
If ReplaceDataInFile("c:\file.dat", "Find This Text", "New text",1,#STRING_CASESENSITIVE,4*1024,@counter)>0
  Debug "Success, replacements made: "+Str(counter)
EndIf

;more examples for hex data:
ReplaceDataInFile("c:\file.dat", "7062", "5042")
ReplaceDataInFile("c:\file.dat", "FF", "00")
ReplaceDataInFile("c:\file.dat", "5042", "7062",1,#REPLACEONE)
ReplaceDataInFile("c:\file.dat", "5042", "70",1,#REPLACEONE|#FILLDATA)
ReplaceDataInFile("c:\file.dat", "5042", "70",1,#REPLACEONE|#FILLDATA,4*1024,@counter)

;string, case-insensitive:
ReplaceDataInFile("c:\file.dat", "find", "re",1,#STRINGMODE|#FILLDATA)
I hope that you find it useful.

Edit: fixed #REPLACEONE bug found by wallgod
Last edited by cas on Mon Oct 11, 2010 10:24 pm, edited 1 time in total.
cas
Enthusiast
Enthusiast
Posts: 597
Joined: Mon Nov 03, 2008 9:56 pm

Re: Search/replacing data in a binary file?

Post by cas »

If you want to use this function on files larger than 4GB then change this variables to quad:
counter.q, replacecount.q (not neccessary, look at last line with PokeQ()...)
StartPosition.q
readpos.q
thiseof.q
and instead of PokeI() you can use PokeQ() (to be safe if more than 4294967295 replacements are made in file :D ).
wallgod
User
User
Posts: 48
Joined: Wed Oct 06, 2010 2:03 pm

Re: Search/replacing data in a binary file?

Post by wallgod »

Wow, thanks cas. You put a lot of work into that. I'll play with it today and report back.
Procrastinators unite... tomorrow!
wallgod
User
User
Posts: 48
Joined: Wed Oct 06, 2010 2:03 pm

Re: Search/replacing data in a binary file?

Post by wallgod »

Works really well — thank you so much! It gets the job done, and fast too.

UPDATE:

I fixed a minor bug for the #REPLACEONE flag; it should Break 2, instead of just Break.

Cheers
Procrastinators unite... tomorrow!
cas
Enthusiast
Enthusiast
Posts: 597
Joined: Mon Nov 03, 2008 9:56 pm

Re: Search/replacing data in a binary file?

Post by cas »

Nice find :) , thanks for posting it here, i will edit my post.
wallgod
User
User
Posts: 48
Joined: Wed Oct 06, 2010 2:03 pm

Re: Search/replacing data in a binary file?

Post by wallgod »

I noticed a couple of #STRINGMODE quirks. It's probably my fault, but I thought it might be worth mentioning.

1) If _ltf is not equal in length to _ltr, it causes an invalid parameter result.
2) Because of [1], i removed "_ltf<>_ltr" from:
If length=0 Or StartPosition=<0 Or _ltf<>_ltr Or BufferLen=<0 Or BufferLen<length Or Flags<0
in order to force it to work. It replaces now, but will also limit the length of the replacement string to the length of the search string. So for example, if I wanted to replace "Span" with "Spain", the result would be "Spai". :P
Procrastinators unite... tomorrow!
cas
Enthusiast
Enthusiast
Posts: 597
Joined: Mon Nov 03, 2008 9:56 pm

Re: Search/replacing data in a binary file?

Post by cas »

wallgod wrote:So for example, if I wanted to replace "Span" with "Spain", the result would be "Spai". :P
Unfortunately, it is not possible to replace 4 bytes with 5 bytes in same file without overwriting other bytes (except if writing to end of file). And, it is not possible to replace 5 bytes with 4 bytes. For that situations you need to 1)rename original file to some different name, 2) create new file with same name as original file and 3)stream data to new file from from original renamed file.
With this function, if you replace 5 bytes with 4 bytes then it appends additional (5th) empty byte to that 4 bytes so they are equal.
Post Reply