Search/replacing data in a binary file?

Just starting out? Need help? Post your questions and find answers here.
wallgod
User
User
Posts: 48
Joined: Wed Oct 06, 2010 2:03 pm

Re: Search/replacing data in a binary file?

Post by wallgod »

cas wrote:Try this, i didn't tested it:

Code: Select all

ToFind$       = "F606017415538D"
ToReplace$    = "8026007415538D"

length        = Len(ToFind$)/2

*searchdata   = AllocateMemory(length)
*replacedata  = AllocateMemory(length)
*testdata     = AllocateMemory(length)

;///////////////////////////////////////////////////////////
; Here place your search and replace data in the mem blocks

For k=0 To length-1
  PokeB(*searchdata+k ,Val("$"+PeekS(@ToFind$+(k*2)   ,2)))
  PokeB(*replacedata+k,Val("$"+PeekS(@ToReplace$+(k*2),2)))
Next

;///////////////////////////////////////////////////////////

If OpenFile(0, "myfile.bin")
  readpos = 0
  thiseof = Lof(0) - length
  While (Not Eof(0)) And readpos <= thiseof
    FileSeek(0, readpos)
    ReadData(0, *testdata, length)
    If CompareMemory(*testdata, *searchdata, length)
      WriteData(0, *replacedata, length)
    EndIf
    readpos + 1
  Wend
  CloseFile(0)
EndIf
Well this code made a change in the file unlike the other one, but for some odd reason it didn't change the file where I expected. Instead of replacing:

[F606017415538D]
with
[8026007415538D]

...it ended up replacing:

[4E0CE884CB0000]
with
[8026007415538D]

I have no idea how that happened. Also, it still takes a long time; I assume this is because it's reading one byte at a time. Is there any way to perhaps read the entire chunk of the file in at once and then get the search string position(s) faster for replacing?
Procrastinators unite... tomorrow!
cas
Enthusiast
Enthusiast
Posts: 597
Joined: Mon Nov 03, 2008 9:56 pm

Re: Search/replacing data in a binary file?

Post by cas »

Looks like we missed FileSeek(0, readpos) before WriteData(0, *replacedata, length). Try adding this line and post results. Yes, you are right, it is for small files and for larger ones it would be much faster to read file in memory.
wallgod
User
User
Posts: 48
Joined: Wed Oct 06, 2010 2:03 pm

Re: Search/replacing data in a binary file?

Post by wallgod »

cas wrote:Looks like we missed FileSeek(0, readpos) before WriteData(0, *replacedata, length). Try adding this line and post results. Yes, you are right, it is for small files and for larger ones it would be much faster to read file in memory.
Yep, that did the trick. Thank you!

Code: Select all

ToFind$       = "F606017415538D"
ToReplace$    = "8026007415538D"

length        = Len(ToFind$)/2

*searchdata   = AllocateMemory(length)
*replacedata  = AllocateMemory(length)
*testdata     = AllocateMemory(length)

For k=0 To length-1
  PokeB(*searchdata+k ,Val("$"+PeekS(@ToFind$+(k*2)   ,2)))
  PokeB(*replacedata+k,Val("$"+PeekS(@ToReplace$+(k*2),2)))
Next

If OpenFile(0, "myfile.bin")
  readpos = 0
  thiseof = Lof(0) - length
  While (Not Eof(0)) And readpos <= thiseof
    FileSeek(0, readpos)
    ReadData(0, *testdata, length)
    If CompareMemory(*testdata, *searchdata, length)
      FileSeek(0, readpos)
      WriteData(0, *replacedata, length)
    EndIf
    readpos + 1
  Wend
  CloseFile(0)
EndIf
How do we read the entire file at once in memory to speed things up? It's my understanding that due to chr(0), it's not practical to use a string, so how would this work?
Procrastinators unite... tomorrow!
cas
Enthusiast
Enthusiast
Posts: 597
Joined: Mon Nov 03, 2008 9:56 pm

Re: Search/replacing data in a binary file?

Post by cas »

wallgod wrote:so how would this work?
With AllocateMemory() and fixed buffer (to support large files, 100MB, 1GB,... and more), then in loop use ReadData() and try to find data with CompareMemory(), then use FileSeek() to that locations to WriteData().
wallgod
User
User
Posts: 48
Joined: Wed Oct 06, 2010 2:03 pm

Re: Search/replacing data in a binary file?

Post by wallgod »

cas wrote:
wallgod wrote:so how would this work?
With AllocateMemory() and fixed buffer (to support large files, 100MB, 1GB,... and more), then in loop use ReadData() and try to find data with CompareMemory(), then use FileSeek() to that locations to WriteData().
If I'm reading in the file in chunks and looking at those chunks for the find string, then isn't it possible for the find string to be overlooked due to a split that separates it into two pieces?
Procrastinators unite... tomorrow!
cas
Enthusiast
Enthusiast
Posts: 597
Joined: Mon Nov 03, 2008 9:56 pm

Re: Search/replacing data in a binary file?

Post by cas »

Here is current work in this thread wrapped to threadsafe procedure:

Code: Select all

#REPLACEONE = 0
#REPLACEALL = 1

Procedure HexReplaceInFile(File$, ToFind$, ToReplace$, StartPosition=1, Mode=#REPLACEALL)
  
  ;RETURN:  <0  if error {-3=file open error;-2=internal memory allocation failed;-1=wrong parameters}
  ;         =0  if ToFind$ not found
  ;         >0  position of last replacement
  
  Protected length = Len(ToFind$)/2
  Protected k, result
  
  If length=0 Or StartPosition=<0 Or (Len(ToFind$)<>Len(ToReplace$)) Or ((Len(ToFind$)%2)<>0)
    ProcedureReturn -1
  EndIf
  
  Protected *searchdata  = AllocateMemory(length*3)
  Protected *replacedata = *searchdata+length
  Protected *testdata    = *replacedata+length
  
  If *searchdata = 0
    ProcedureReturn -2
  EndIf
  
  For k=0 To length-1
    PokeB(*searchdata+k ,Val("$"+PeekS(@ToFind$+(k*2)   ,2)))
    PokeB(*replacedata+k,Val("$"+PeekS(@ToReplace$+(k*2),2)))
  Next
  
  Protected file = OpenFile(#PB_Any, File$)
  If file
    Protected readpos = StartPosition-1
    Protected thiseof = Lof(file) - length
    While (Not Eof(file)) And readpos <= thiseof
      FileSeek(file, readpos)
      ReadData(file, *testdata, length)
      If CompareMemory(*testdata, *searchdata, length)
        FileSeek(file, readpos)
        WriteData(file, *replacedata, length)
        result=readpos+1
        If Mode = #REPLACEONE
          Break
        EndIf
      EndIf
      readpos + 1
    Wend
    CloseFile(file)
  Else
    result=-3
  EndIf
  
  FreeMemory(*searchdata)
  
  ProcedureReturn result
EndProcedure

Debug HexReplaceInFile("file.dat", "F606017415538D", "8026007415538D")
cas
Enthusiast
Enthusiast
Posts: 597
Joined: Mon Nov 03, 2008 9:56 pm

Re: Search/replacing data in a binary file?

Post by cas »

wallgod wrote:If I'm reading in the file in chunks and looking at those chunks for the find string, then isn't it possible for the find string to be overlooked due to a split that separates it into two pieces?
But you only test it from 0 to (LenOfChunk-LenOfDataToFind) and next chunk will start from (LenOfChunk-LenOfDataToFind).
wallgod
User
User
Posts: 48
Joined: Wed Oct 06, 2010 2:03 pm

Re: Search/replacing data in a binary file?

Post by wallgod »

cas wrote:
wallgod wrote:If I'm reading in the file in chunks and looking at those chunks for the find string, then isn't it possible for the find string to be overlooked due to a split that separates it into two pieces?
But you only test it from 0 to (LenOfChunk-LenOfDataToFind) and next chunk will start from (LenOfChunk-LenOfDataToFind).
That makes sense. I tested the threadsafe version, and it works btw.
Procrastinators unite... tomorrow!
wallgod
User
User
Posts: 48
Joined: Wed Oct 06, 2010 2:03 pm

Re: Search/replacing data in a binary file?

Post by wallgod »

What about reading the file into an array, where each chr(0) is kept as an empty row in the array?

Another idea would be to substitute each chr(0) with a string that can be either specified or generated (non-redundant), and then when writing back to the file (in memory?) it would swap the non-redundant string with a chr(0), to create the original file.
Procrastinators unite... tomorrow!
cas
Enthusiast
Enthusiast
Posts: 597
Joined: Mon Nov 03, 2008 9:56 pm

Re: Search/replacing data in a binary file?

Post by cas »

This code could contain bugs. I made some basic tests and it looks fine. Please test it extensively before any serious use and post any bugs here so we can fix it if they exist. Hex search is default but you can pass string and set flag to #STRINGMODE to search for string in file.

I tested it with 1MB text file, length of string to find was 3 characters, made around 75000 replacements in file.
Result without buffered read (when: buffer length == length of data to find): 4165ms
Result with 4096 bytes buffer: 390ms :D
10 times faster :) It can be more optimized for larger strings when doing case-insensitive search with some modifications but i leave that to you :).
Here is the code, i added some unnecessary 'features', you can remove them if you don't like them, and i left some comments:

Code: Select all

EnableExplicit
DisableDebugger

#REPLACEONE     = 1 ;make only one replacement, default is to search whole file and make multiple replacements (without this flag)
#STRINGMODE     = 2 ;find string, default is hex (without this flag)
#CASESENSITIVE  = 4 ;case sensitive string search and replace, default is case-insensitive  (without this flag)
#FILLDATA       = 8 ;if DataToReplace$ is smaller than DataToFind$ then append data to make it same length as DataToFind$, function returns error (-1) without this flag if DataToFind$ and DataToReplace$ are not same length

#STRING_CASESENSITIVE=#STRINGMODE|#CASESENSITIVE|#FILLDATA

Procedure ReplaceDataInFile(File$, DataToFind$, DataToReplace$, StartPosition=1, Flags=#FILLDATA, BufferLen=4096,*Count=0)
  
  ;RETURN:  <0  if error {-4=file not found;-3=file open error;-2=not enough free memory on system;-1=wrong parameters}
  ;         =0  if ToFind$ not found
  ;         >0  position of last replacement
  
  Protected _ltf=Len(DataToFind$)
  Protected _ltr=Len(DataToReplace$)
  
  Protected length = _ltf
  Protected k, result
  
  If Not Flags & #STRINGMODE  ; if hex search...
    If (length%2)<>0          ; ...then len must be {2,4,6,8,...}
      ProcedureReturn -1    ; we don't have valid len for hex search (1,3,5,7,... is not valid)
    EndIf
    length/2                  ; 2 hex characters means 1 byte
  EndIf
  
  If Flags & #FILLDATA
    If _ltf>_ltr                        ;if length of DataToReplace$ is smaller than DataToFind$...
      DataToReplace$+Space(_ltf-_ltr)   ;...then add empty data (space characters: chr(20)) to end of DataToReplace$
      Protected __ltr=_ltr
      _ltr=Len(DataToReplace$)
      If Not Flags & #STRINGMODE        ;if we do not search for string (--> hex search)...
        For k=__ltr To _ltf-1
          PokeS(@DataToReplace$+k,"0")    ;...then convert spaces to zeros (or instead of "0" to "F")
        Next
      EndIf
    EndIf
  EndIf
  
  If length=0 Or StartPosition=<0 Or _ltf<>_ltr Or BufferLen=<0 Or BufferLen<length Or Flags<0
    ProcedureReturn -1
  EndIf
  
  Protected *searchdata         = AllocateMemory((length*2)+BufferLen)
  Protected *replacedata        = *searchdata+length
  Protected *testdata.Character = *replacedata+length
  
  If *searchdata=0
    ProcedureReturn -2
  EndIf
  
  If Flags & #STRINGMODE   ; if string search...
    If Not Flags & #CASESENSITIVE
      DataToFind$=UCase(DataToFind$)
    EndIf
    PokeS(*searchdata ,DataToFind$)
    PokeS(*replacedata,DataToReplace$)
  Else                    ; if hex search...
    For k=0 To length-1
      PokeB(*searchdata +k,Val("$"+PeekS(@DataToFind$   +(k*2),2)))
      PokeB(*replacedata+k,Val("$"+PeekS(@DataToReplace$+(k*2),2)))
    Next
  EndIf
  
  If FileSize(File$)>=0 ;OpenFile() creates file if it doesn't exist --> we don't want that so we check it before if it exists
    Protected file = OpenFile(#PB_Any, File$) ;if someone deletes file before this line is executed then we create empty file :( but probability of this is so small that we simply ignore it
    If file
      Protected readpos = StartPosition-1
      Protected thiseof = Lof(file) - length
      Protected replacecount = 0
      While (Not Eof(file)) And readpos <= thiseof
        FileSeek(file, readpos)
        
        Protected DataRead=ReadData(file, *testdata, BufferLen)
        If Flags & #STRINGMODE
          If Not Flags & #CASESENSITIVE
            For k=0 To DataRead-1   ;we can't convert it to uppercase at once because it could have chr(0) in the middle so we have this loop
              *testdata\c = Asc(UCase(Chr(*testdata\c)))  ;*testdata\c is a little faster than PeekC()
              *testdata+1 ;optionally: create structure with chr.c[0] and then use *testdata\chr[k] in previous line instead of this
            Next
            *testdata-DataRead    ;return pointer to first byte
          EndIf
        EndIf
        
        ;Debug "Buffered: "+PeekS(*testdata,DataRead)
        
        For k=0 To DataRead-length
          ;Debug "> Test: "+PeekS(*testdata+k,length) + " == " + PeekS(*searchdata,length)
          If CompareMemory(*testdata+k, *searchdata, length)
            FileSeek(file, readpos+k)
            ;Debug "Replacing data..."
            WriteData(file, *replacedata, length)
            result=readpos+k+1  ;save position
            replacecount+1
            If ((DataRead-length)-k)=<length
              readpos+(2*length+k-DataRead-1)
            EndIf
            k+(length-1)        ;skip comparing replaced characters in this loop
            If Flags & #REPLACEONE
              Break 2
            EndIf
          EndIf
        Next
        
        readpos + (DataRead - (length-1))
        
      Wend
      CloseFile(file)
    Else
      result=-3
    EndIf
  Else
    result=-4
  EndIf
  
  FreeMemory(*searchdata)
  
  If *Count<>0
    PokeI(*Count,replacecount)
  EndIf
  
  ProcedureReturn result
EndProcedure

;*************************************************************
;*************************************************************
;*************************************************************

;here is one way of many how to use it:
Define counter.i
Select ReplaceDataInFile("c:\file.txt", "Find This Text", "New text",1,#STRING_CASESENSITIVE,4*1024,@counter)
  Case -4,-3,-2,-1
    MessageRequester("Error","Procedure failed!",#MB_ICONERROR)
  Case 0
    MessageRequester("","No replacements made.")
  Default
    MessageRequester("Success","Number of replacements made: "+Str(counter.i))
EndSelect

;or simpler:
ReplaceDataInFile("c:\file.dat", "Find This Text", "New text",1,#STRING_CASESENSITIVE,4*1024,@counter)
EnableDebugger
If counter>0
  Debug "Success, replacements made: "+Str(counter)
EndIf

;or:
If ReplaceDataInFile("c:\file.dat", "Find This Text", "New text",1,#STRING_CASESENSITIVE,4*1024,@counter)>0
  Debug "Success, replacements made: "+Str(counter)
EndIf

;more examples for hex data:
ReplaceDataInFile("c:\file.dat", "7062", "5042")
ReplaceDataInFile("c:\file.dat", "FF", "00")
ReplaceDataInFile("c:\file.dat", "5042", "7062",1,#REPLACEONE)
ReplaceDataInFile("c:\file.dat", "5042", "70",1,#REPLACEONE|#FILLDATA)
ReplaceDataInFile("c:\file.dat", "5042", "70",1,#REPLACEONE|#FILLDATA,4*1024,@counter)

;string, case-insensitive:
ReplaceDataInFile("c:\file.dat", "find", "re",1,#STRINGMODE|#FILLDATA)
I hope that you find it useful.

Edit: fixed #REPLACEONE bug found by wallgod
Last edited by cas on Mon Oct 11, 2010 10:24 pm, edited 1 time in total.
cas
Enthusiast
Enthusiast
Posts: 597
Joined: Mon Nov 03, 2008 9:56 pm

Re: Search/replacing data in a binary file?

Post by cas »

If you want to use this function on files larger than 4GB then change this variables to quad:
counter.q, replacecount.q (not neccessary, look at last line with PokeQ()...)
StartPosition.q
readpos.q
thiseof.q
and instead of PokeI() you can use PokeQ() (to be safe if more than 4294967295 replacements are made in file :D ).
wallgod
User
User
Posts: 48
Joined: Wed Oct 06, 2010 2:03 pm

Re: Search/replacing data in a binary file?

Post by wallgod »

Wow, thanks cas. You put a lot of work into that. I'll play with it today and report back.
Procrastinators unite... tomorrow!
wallgod
User
User
Posts: 48
Joined: Wed Oct 06, 2010 2:03 pm

Re: Search/replacing data in a binary file?

Post by wallgod »

Works really well — thank you so much! It gets the job done, and fast too.

UPDATE:

I fixed a minor bug for the #REPLACEONE flag; it should Break 2, instead of just Break.

Cheers
Procrastinators unite... tomorrow!
cas
Enthusiast
Enthusiast
Posts: 597
Joined: Mon Nov 03, 2008 9:56 pm

Re: Search/replacing data in a binary file?

Post by cas »

Nice find :) , thanks for posting it here, i will edit my post.
wallgod
User
User
Posts: 48
Joined: Wed Oct 06, 2010 2:03 pm

Re: Search/replacing data in a binary file?

Post by wallgod »

I noticed a couple of #STRINGMODE quirks. It's probably my fault, but I thought it might be worth mentioning.

1) If _ltf is not equal in length to _ltr, it causes an invalid parameter result.
2) Because of [1], i removed "_ltf<>_ltr" from:
If length=0 Or StartPosition=<0 Or _ltf<>_ltr Or BufferLen=<0 Or BufferLen<length Or Flags<0
in order to force it to work. It replaces now, but will also limit the length of the replacement string to the length of the search string. So for example, if I wanted to replace "Span" with "Spain", the result would be "Spai". :P
Procrastinators unite... tomorrow!
Post Reply