FastReplaceString

Share your advanced PureBasic knowledge/code with the community.
whertz
Enthusiast
Enthusiast
Posts: 124
Joined: Sat Jun 25, 2005 2:16 pm
Location: United Kingdom

FastReplaceString

Post by whertz »

FastReplaceString

This is based on Froggerprogger's ReplaceFileData code from the CodeArchive. I was messing around with it and changed it to use a memory buffer instead. I needed to do this because I have a big text file (about 13mb) in ram and needed to replace strings, but doing a ReplaceString on it takes a long time.

This procedure is faster than ReplaceString on large files, especially without the debugger on and without case-insensitive search turned on. The test code creates two 10mb strings, and benchmarks ReplaceString with my FastReplaceString.

FastReplaceString uses the same arguments as ReplaceString, only you have to specify case-insensitive flag as it uses a macro.

My results - PB4.20b6 / Centrino 1600mhz 512mb Laptop:

Debugger on:
ReplaceString: 38686
FastReplaceString: 9414

Debugger off:
ReplaceString: 38346
FastReplaceString: 540

EDIT: Removed the macro and replaced it with a procedure, as it didn't return the result$

Code: Select all


Procedure.l FindMemString(*src.l, *searchdata.l, searchdataLen.l, memsize.l,mode) 
  Protected numResults.l 
  While i <= memsize - 1 
    If mode=1 : b1.s=LCase(Chr(PeekB(*src + i))) : b2.s=LCase(Chr(PeekB(*searchdata))) : EndIf
    
    If (mode=0 And PeekB(*src + i) <> PeekB(*searchdata)) Or (mode=1 And b1 <> b2)
      i + 1 
    Else 
      same = #True : j=1 
      While same = #True And j < searchdataLen 
        If mode=1 : b1.s=LCase(Chr(PeekB(*src + i + j))) : b2.s=LCase(Chr(PeekB(*searchdata + j))) : EndIf
   
        If (mode=0 And PeekB(*src + i + j) <> PeekB(*searchdata + j) ) Or (mode=1 And b1 <> b2)
          same = #False 
        Else 
          j+1 
        EndIf 
      Wend
      If same = #True    
        numResults + 1 
      EndIf 
      i + j 
    EndIf 
  Wend 
  ProcedureReturn numResults 
EndProcedure 

Procedure.l ReplaceMemData(*src.l, *dest.l, *searchdata.l, searchdataLen.l, *replacedata, replacedataLen.l, memsize.l, mode.l) 
  Protected numResults.l 

  If replacedataLen>searchdataLen
    extra.l=findmemstring(*src,*searchdata,searchdataLen.l,memsize,mode) * (replacedataLen-searchdataLen)
    *destmem=AllocateMemory(memsize+extra)
  ElseIf replacedataLen<searchdataLen
    extra.l=findmemstring(*src,*searchdata,searchdataLen.l,memsize,mode) * (searchdataLen-replacedataLen)
    *destmem=AllocateMemory(memsize-extra)
  Else
    *destmem=AllocateMemory(memsize)
  EndIf
  PokeL(*dest,*destmem)
    
  i = 0 : n = 0 
  While i <= memsize - 1
    If mode=1 : b1.s=LCase(Chr(PeekB(*src + i))) : b2.s=LCase(Chr(PeekB(*searchdata))) : EndIf
    
    If (mode=0 And PeekB(*src + i) <> PeekB(*searchdata)) Or (mode=1 And b1 <> b2)
      PokeB(*destmem + i + n,PeekB(*src + i))
      i + 1 
    Else 
      same = #True : j=1 
      While same = #True And j < searchdataLen
        If mode=1 : b1.s=LCase(Chr(PeekB(*src + i + j))) : b2.s=LCase(Chr(PeekB(*searchdata + j))) : EndIf
   
        If (mode=0 And PeekB(*src + i + j) <> PeekB(*searchdata + j)) Or (mode=1 And b1 <> b2)
          same = #False 
        Else 
          j+1 
        EndIf 
      Wend 
      If same = #True 
        CopyMemory(*replacedata,*destmem + i + n,replacedataLen)
        If searchdataLen>replacedataLen
          n-(searchdataLen-replacedataLen)
        EndIf
        If searchdataLen<replacedataLen
          n+(replacedataLen-searchdataLen)
        EndIf   
        numResults + 1 
      Else 
        CopyMemory(*src + i,*destmem + i + n, j)
      EndIf 
      i + j 
    EndIf 
  Wend 
  ProcedureReturn numResults 
EndProcedure 

Procedure.s FastReplaceString(string.s,stringtofind.s,stringtoreplace.s,mode.l)
  replacememdata(@string,@*destmem,@stringtofind,Len(stringtofind),@stringtoreplace,Len(stringtoreplace),Len(string),mode) 
  string.s=Space(MemorySize(*destmem))
  CopyMemory(*destmem,@string,MemorySize(*destmem))
  FreeMemory(*destmem)
  ProcedureReturn string
EndProcedure


; TEST CODE

searchstring.s="Original Text"
replacestring.s="Replaced Text String"

;create 10mb memory
*sourcemem=AllocateMemory(10485760)
For n=0 To MemorySize(*sourcemem)-1
  PokeB(*sourcemem+n,32)
Next n
;place search string 10000 times in memory
For n=1 To 10000
  CopyMemory(@searchstring,*sourcemem+Random(MemorySize(*sourcemem)-Len(searchstring)),Len(searchstring))
Next n

;use the routine to search and replace a memory buffer
;replacememdata(*sourcemem,@*destmem,@searchstring,Len(searchstring),@replacestring,Len(replacestring),MemorySize(*sourcemem),0)

;create a 10mb test string
test_string1.s=Space(MemorySize(*sourcemem))
CopyMemory(*sourcemem,@test_string1,MemorySize(*sourcemem))
FreeMemory(*sourcemem)
test_string2.s=test_string1

;original replacestring
t=ElapsedMilliseconds()
test_string1=ReplaceString(test_string1,searchstring,replacestring)
result1=ElapsedMilliseconds()-t

;new replacestring
t=ElapsedMilliseconds()
test_string2=FastReplaceString(test_string2,searchstring,replacestring,0)
result2=ElapsedMilliseconds()-t
MessageRequester("Results","Original replacestring: "+Str(result1)+" milliseconds"+Chr(13)+Chr(10)+"New replacestring: "+Str(result2)+" milliseconds")

;write out test data
;file.s="c:\testdata"
;If CreateFile(0,file)
;  WriteData(0,@test_string2,Len(test_string2))
;  CloseFile(0)
;EndIf


Last edited by whertz on Sun May 18, 2008 8:06 pm, edited 1 time in total.
whertz
Enthusiast
Enthusiast
Posts: 124
Joined: Sat Jun 25, 2005 2:16 pm
Location: United Kingdom

Post by whertz »

Forgot to mention it can be used on allocated memory as well as strings, an example is commented out.
Fred
Administrator
Administrator
Posts: 18162
Joined: Fri May 17, 2002 4:39 pm
Location: France
Contact:

Post by Fred »

I was quite surprised when seeing these results. I investigated to see why ReplaceString() was so slow, and it was because of many realloc occuring on larger buffer (i though Realloc() would be faster, anyway). So i modified the routine to do suppress these, it should be faster than FastReplaceString() now. You can test it here: www.purebasic.com/beta/StringExtension
AND51
Addict
Addict
Posts: 1040
Joined: Sun Oct 15, 2006 8:56 pm
Location: Germany
Contact:

Post by AND51 »

Debugger off / original String-Lib:
  • ReplaceString: 113'984 ms
    FastReplace: 625 ms
    Ratio: 1:182
3.4 GHz Dual Core, 2 GB RAM
PB 4.30

Code: Select all

onErrorGoto(?Fred)
AND51
Addict
Addict
Posts: 1040
Joined: Sun Oct 15, 2006 8:56 pm
Location: Germany
Contact:

Post by AND51 »

Fred, how do I have to install your Lib? If I put it into \PureBasic\PureLibraries, the compiler doesn't start.
:twisted: even when restoring the old file, the compiler doesn't start - it seems to be 'destroyed'...
I'm looking forward to reinstall PureBasic........
PB 4.30

Code: Select all

onErrorGoto(?Fred)
User avatar
ts-soft
Always Here
Always Here
Posts: 5756
Joined: Thu Jun 24, 2004 2:44 pm
Location: Berlin - Germany

Post by ts-soft »

can't test the betalib. pb will not start :?
PureBasic 5.73 | SpiderBasic 2.30 | Windows 10 Pro (x64) | Linux Mint 20.1 (x64)
Old bugs good, new bugs bad! Updates are evil: might fix old bugs and introduce no new ones.
Image
whertz
Enthusiast
Enthusiast
Posts: 124
Joined: Sat Jun 25, 2005 2:16 pm
Location: United Kingdom

Post by whertz »

Fred wrote:I was quite surprised when seeing these results. I investigated to see why ReplaceString() was so slow, and it was because of many realloc occuring on larger buffer (i though Realloc() would be faster, anyway). So i modified the routine to do suppress these, it should be faster than FastReplaceString() now. You can test it here: www.purebasic.com/beta/StringExtension
Thanks Fred, more speed is always better. Will this make it into 4.20 final?
Fred
Administrator
Administrator
Posts: 18162
Joined: Fri May 17, 2002 4:39 pm
Location: France
Contact:

Post by Fred »

I re-uploaded the lib, sorry.
User avatar
ts-soft
Always Here
Always Here
Posts: 5756
Joined: Thu Jun 24, 2004 2:44 pm
Location: Berlin - Germany

Post by ts-soft »

works fine :!:

without debugger:

Code: Select all

---------------------------
Results
---------------------------
Original replacestring: 94 milliseconds

New replacestring: 437 milliseconds
---------------------------
OK   
---------------------------
PB 4,5x faster

/edit
AMD AM2 6000+ (2x 3030 MHz)
Last edited by ts-soft on Sun May 18, 2008 9:54 pm, edited 1 time in total.
PureBasic 5.73 | SpiderBasic 2.30 | Windows 10 Pro (x64) | Linux Mint 20.1 (x64)
Old bugs good, new bugs bad! Updates are evil: might fix old bugs and introduce no new ones.
Image
AND51
Addict
Addict
Posts: 1040
Joined: Sun Oct 15, 2006 8:56 pm
Location: Germany
Contact:

Post by AND51 »

Debugger off / NEW BETA String-Lib:
  • ReplaceString: 141 ms
    FastReplace: 609 ms
    Ratio: 4,3:1
3.4 GHz Dual Core, 2 GB RAM




Amazing, Fred! :shock:
Congratulations + Applause + Merci!!! :D


btw, merci that you've corrected the case, when FindString$ is empty; it returns now immediately and this is the expected behaviour IMHO. IIRC this was not right in the older versions.
Thank you again!
PB 4.30

Code: Select all

onErrorGoto(?Fred)
Post Reply