InMem() findstring function for binary data

Share your advanced PureBasic knowledge/code with the community.
User avatar
pdwyer
Addict
Addict
Posts: 2813
Joined: Tue May 08, 2007 1:27 pm
Location: Chiba, Japan

InMem() findstring function for binary data

Post by pdwyer »

Thanks to some help from several people I got my first little memory manipulation function done. I'm used to using strings in basic to manipulate binary data but PB doesn't really work like that so I'm learning a lot by reinventing this wheel.

The main proc is:

Declare InMem(StartPos.l, *MainMem, MainLen.l, *FindMem, FindLen.l)

It works like the FindString function or the InStr function of other basics but you can seach chr(0) and other naughty little bytes. Since it works with pointers it needs the lengths passed too unfortunately

This just shows a binary file being opened, and a text search done on it even though there are nulls all the way though. I'm sure ASM guru's could make it faster but it seems okay speed wise as it's all just pointers.

Time permitting I want to do Mid() Trim(), stringfield() and Replace()

(I'm easily distracted though and these thing mysteriously turn into vapour :wink: )

One thing I can think of is turn the LONGs into QUADS so that it can read files larger than 2gb but it'd slow it down a little I guess

Code: Select all


Declare ProcMain() 
Declare InMem(StartPos.l, *MainMem, MainLen.l, *FindMem, FindLen.l) 

Structure MemoryArray 
    Byte.b[0] 
EndStructure 
     
;=========================================================================

ProcMain() 

;=========================================================================

Procedure.l ProcMain() 
 
    Infile.s = "C:\WINDOWS\notepad.exe" ;choose a binary file
    hFile.l = ReadFile(#PB_Any,Infile)
    FileLen.l = Lof(hfile)

    *MainMemPtr = AllocateMemory(FileLen)   ;Allocate Some Memory 
    *FindMemPtr = AllocateMemory(2)   ;Allocate Some Memory
  
    ; Find arbitrary bytes
    ;PokeB(*FindMemPtr, 0)
    ;PokeB(*FindMemPtr + 1, Asc("l"))
    
    ;Find a string
    FindStr.s = "ext"
    FindLen.l = Len(FindStr)
  
    *FindMemPtr = @FindStr
  
    ReadData(hfile, *MainMemPtr, FileLen)
        
    Debug "Found at " + Str(inMem(1, *MainMemPtr, FileLen, *FindMemPtr, FindLen))
    Debug "Found at " + Str(inMem(500, *MainMemPtr, FileLen, *FindMemPtr, FindLen))
    Debug "Found at " + Str(inMem(29000, *MainMemPtr, FileLen, *FindMemPtr, FindLen))
    
EndProcedure 

;=========================================================================

Procedure.l InMem(StartPos.l, *MainMem, MainLen.l, *FindMem, FindLen.l) 

    *MainByteArray.MemoryArray = *MainMem
    *FindByteArray.MemoryArray = *FindMem 

    If StartPos < 1 : StartPos = 1 : EndIf
    
    FoundPos.l = 0
    
    For MainArrayLoop.l = StartPos - 1 To MainLen -1

        For FindArrayLoop.l = 0 To FindLen -1
     
            If MainArrayLoop + FindArrayLoop = MainLen
                ;End reached
                Break
            EndIf

            If *MainByteArray\byte[MainArrayLoop + FindArrayLoop] = *FindByteArray\byte[FindArrayLoop] 
                ; Keep Going
            Else
                Break
            EndIf
        
            If FindArrayLoop = FindLen -1 ;we reached the end of the search
                FoundPos.l = MainArrayLoop + 1
                Break 2
            EndIf

        Next
    Next

    ProcedureReturn FoundPos

EndProcedure 

;=========================================================================


Paul Dwyer

“In nature, it’s not the strongest nor the most intelligent who survives. It’s the most adaptable to change” - Charles Darwin
“If you can't explain it to a six-year old you really don't understand it yourself.” - Albert Einstein
gnozal
PureBasic Expert
PureBasic Expert
Posts: 4229
Joined: Sat Apr 26, 2003 8:27 am
Location: Strasbourg / France
Contact:

Post by gnozal »

I don't know if it's faster but you could also use the CompareMemory() and CompareMemoryString() functions to compare two memory areas / strings.
For free libraries and tools, visit my web site (also home of jaPBe V3 and PureFORM).
User avatar
pdwyer
Addict
Addict
Posts: 2813
Joined: Tue May 08, 2007 1:27 pm
Location: Chiba, Japan

Post by pdwyer »

Good idea, I might give that a go, I can then pat myself on the back if mine is faster :P

In the mean time, here's the next function, turned out to be pretty simple

MidMem(*MainMem, StartPos.l, GetLen.l)

Code: Select all

Declare ProcMain() 
Declare MidMem(*MainMem, StartPos.l, GetLen.l) 
    
;=========================================================================

ProcMain() 

;=========================================================================

Procedure.l ProcMain() 
 
    Infile.s = "C:\WINDOWS\notepad.exe" ;choose a binary file
    hFile.l = ReadFile(#PB_Any,Infile)
    FileLen.l = Lof(hfile)

    *MainMemPtr = AllocateMemory(FileLen)   ;Allocate Some Memory 

    ReadData(hfile, *MainMemPtr, FileLen)

    *MidText = MidMem(*MainMemPtr,78,38)

    Debug PeekS(*MidText)
    
    
EndProcedure 

;=========================================================================

Procedure.l MidMem(*MainMem, StartPos.l, GetLen.l)  

    *RetMem = AllocateMemory(GetLen)
    CopyMemory(*MainMem + StartPos, *RetMem, GetLen)

    ProcedureReturn *RetMem

EndProcedure 

;=========================================================================

Paul Dwyer

“In nature, it’s not the strongest nor the most intelligent who survives. It’s the most adaptable to change” - Charles Darwin
“If you can't explain it to a six-year old you really don't understand it yourself.” - Albert Einstein
User avatar
pdwyer
Addict
Addict
Posts: 2813
Joined: Tue May 08, 2007 1:27 pm
Location: Chiba, Japan

Post by pdwyer »

Thanks for the idea Gnozal, Here's the updated code with it.

Haven't done the speed tests yet. It gives identical results which makes it a good sanity check for the proc

Code: Select all


Procedure.l InMem(StartPos.l, *MainMem, MainLen.l, *FindMem, FindLen.l) 

    If StartPos < 1 : StartPos = 1 : EndIf
    
    FoundPos.l = 0

    For MainArrayLoop.l = StartPos - 1 To MainLen -1

        If MainArrayLoop + FindLen = MainLen
            ;End reached
            Break
        EndIf
            
        If CompareMemory(*MainMem + MainArrayLoop, *FindMem, FindLen) = 1
            FoundPos = MainArrayLoop + 1
            Break
        EndIf    

    Next

    ProcedureReturn FoundPos

EndProcedure 
Paul Dwyer

“In nature, it’s not the strongest nor the most intelligent who survives. It’s the most adaptable to change” - Charles Darwin
“If you can't explain it to a six-year old you really don't understand it yourself.” - Albert Einstein
User avatar
netmaestro
PureBasic Bullfrog
PureBasic Bullfrog
Posts: 8451
Joined: Wed Jul 06, 2005 5:42 am
Location: Fort Nelson, BC, Canada

Post by netmaestro »

If you're able to assume that *mainmem and *findmem are pointers to memory blocks created via AllocateMemory, you could eliminate the length parameters, replacing them inside the procedure with MemorySize(), which would make the procedure that much easier to use.
BERESHEIT
Dare
Addict
Addict
Posts: 1965
Joined: Mon May 29, 2006 1:01 am
Location: Outback

Post by Dare »

Quick attempt (not necessarily quick routine):

Code: Select all

EnableExplicit

Procedure.l binaryFind(*target.l, *searchFor.l, targetLen.l, patternLen.l, startOffset.l = 0)

  Protected *ptr.l                           ; Float on target, start of search
  Protected rValue.l = -1                    ; Value returned, -1 if not found
                                             ;                 else offset

  *ptr = *target + startOffset - 1
  While *ptr + patternLen <= *target + targetLen
    If CompareMemory(*ptr, *searchFor, patternLen)
      rValue = *ptr - *target
      Break
    EndIf
    *ptr + 1
  Wend
  ProcedureReturn rValue
EndProcedure

Define a.s, b.s

a = "123"+Chr(1)+"45612"

b = "3"+Chr(1)
Debug binaryFind(@a, @b, Len(a), Len(b))

b = "6"
Debug binaryFind(@a, @b, Len(a), Len(b))

b = "12"
Debug binaryFind(@a, @b, Len(a), Len(b))

b = "12"
Debug binaryFind(@a, @b, Len(a), Len(b), 4)

b = "21"
Debug binaryFind(@a, @b, Len(a), Len(b), 4)
Edit:

Begger it! No fair sneaking those posts in whilst I was coding this. :)

Edit again.

Spot the deliberate overflow in mine. :?
Plus no checking for validity of offset (or anything else for that matter) :)

PS and final edit:

If we want it optimised we hope Trond takes an interest. He could probably speed it up without using asm. :)
Dare2 cut down to size
User avatar
pdwyer
Addict
Addict
Posts: 2813
Joined: Tue May 08, 2007 1:27 pm
Location: Chiba, Japan

Post by pdwyer »

@NetMaestro,

How likely is that do you think? I'm new to PB so I'm not really sure. I guess it means that memory allocated in an API call or external DLL and passed to you wouldn't work.

It would be good to get rid of the length params though :)

One other thing I want is "InMemR" Which starts it's search from the reverse end of the memory block. In Powerbasic the InStr() had a nifty feature where if the startPos was negative it would mean search from the end backwards. I wonder if PowerBasic has a software patent on that idea? :wink:

Time for bed, I'll think about these again tomorrow, Thanks for your help
Paul Dwyer

“In nature, it’s not the strongest nor the most intelligent who survives. It’s the most adaptable to change” - Charles Darwin
“If you can't explain it to a six-year old you really don't understand it yourself.” - Albert Einstein
User avatar
netmaestro
PureBasic Bullfrog
PureBasic Bullfrog
Posts: 8451
Joined: Wed Jul 06, 2005 5:42 am
Location: Fort Nelson, BC, Canada

Post by netmaestro »

Memory allocated in a dll with the pointer returned to your calling program is accessible from the caller and vice-versa, if you allocate memory in the main prog and pass its pointer to a dll, the dll can read it without problem. So that part shouldn't cause concern anyway.
BERESHEIT
User avatar
pdwyer
Addict
Addict
Posts: 2813
Joined: Tue May 08, 2007 1:27 pm
Location: Chiba, Japan

Post by pdwyer »

Okay, I did the speed perf tests with some interesting results

The test was a 3mb html file being searched for a 36 byte long string that was 3/4 of the way through the file. A loop ran this 5000 times. The exe was compiled and not run from the IDE

The Compare Memory suggestion was slightly quicker to what I first wrote

107.84 Seconds (Compare Method)
117.54 Seconds (My loop in loop method)

I bench marked that against just a 5000x loop using FindString() which was a tad over twice as fast:

48.36 Seconds (Findstring Method)

Showing you that unless you really want to work with known binary data then the native PB method is much better.

BUT

Since I'm new to PB I did the code in a Powerbasic exe too, with 5000 loops of using Instr() with the same files finding the same text (and getting the same return answer) but the time...

12.87 Seconds (4x faster than the PureB code and works with Binary :( )

I ran it several time, I even wrote another PureB app with just the findstring loop so it looked like the PowerB app but the result was the same.

In PowerB the compiler chooses vars as register vars for speeding things up, you can also "Register" a variable so that it sits on a CPU register if you know it's going to be well used. PowerB will chose for you if you chose nothing.

Can PureB do that?

How can I increase the performance of the PureB app without going to ASM, other compiler options??

The only winning point was the PureB exe was 8kb and the Powerb exe was 14kb, but twice the size in this case was worth 4x the speed...

Suggestions? :?
Paul Dwyer

“In nature, it’s not the strongest nor the most intelligent who survives. It’s the most adaptable to change” - Charles Darwin
“If you can't explain it to a six-year old you really don't understand it yourself.” - Albert Einstein
User avatar
pdwyer
Addict
Addict
Posts: 2813
Joined: Tue May 08, 2007 1:27 pm
Location: Chiba, Japan

Post by pdwyer »

I took this another step (for those not bored yet :shock: )

And made the powerb code a DLL to be called by the pureb code and it was sligtly faster than the findstring but hardly

Code: Select all


RetResult = CallFunction(#LIBRARY, "INSTRING" ,@StartPos, *MainMemPtr, @FindStr)  
 = 47 seconds

RetResult = CallFunctionFast(FuncPtr, @StartPos, *MainMemPtr, @FindStr)
 = 46 seconds

For the same data as the previous test.

I learned a LOT along the way about passing strings in DLLs to other compilers so it was a great learning exercies :D

But,

It seems that I'm missing something about performance when it comes to PureB still :?


I suppose I could try getting the Powerb dll to do the loops too and not just be a function to call to get rid of the DLL overheard a bit...
Paul Dwyer

“In nature, it’s not the strongest nor the most intelligent who survives. It’s the most adaptable to change” - Charles Darwin
“If you can't explain it to a six-year old you really don't understand it yourself.” - Albert Einstein
User avatar
Deeem2031
Enthusiast
Enthusiast
Posts: 216
Joined: Sat Sep 20, 2003 3:57 pm
Location: Germany
Contact:

Post by Deeem2031 »

Hm, if you want i have a "bit" faster InMem procedure ;) (It's especially faster when FindLen is big)

(As you can see at the end my "InMem2" returns the pointer, not the position in the memory, but the main function is the same)

Code: Select all

Structure MemoryArray 
    Byte.b[0] 
EndStructure 

Procedure.l InMem(StartPos.l, *MainMem, MainLen.l, *FindMem, FindLen.l) 

    *MainByteArray.MemoryArray = *MainMem 
    *FindByteArray.MemoryArray = *FindMem 

    If StartPos < 1 : StartPos = 1 : EndIf 
    
    FoundPos.l = 0 
    
    For MainArrayLoop.l = StartPos - 1 To MainLen -1 

        For FindArrayLoop.l = 0 To FindLen -1 
      
            If MainArrayLoop + FindArrayLoop = MainLen 
                ;End reached 
                Break 
            EndIf 

            If *MainByteArray\byte[MainArrayLoop + FindArrayLoop] = *FindByteArray\byte[FindArrayLoop] 
                ; Keep Going 
            Else 
                Break 
            EndIf 
        
            If FindArrayLoop = FindLen -1 ;we reached the end of the search 
                FoundPos.l = MainArrayLoop + 1 
                Break 2 
            EndIf 

        Next 
    Next 

    ProcedureReturn FoundPos 

EndProcedure

Procedure.l InMem2(*MainMem.Byte, MainLen.l, *FindMem.Byte, FindLen.l) 
  Protected Dim bp.l(255)
  Protected *MainEnd = *MainMem + MainLen - 1
  
  For i = 0 To 255
    bp(i) = FindLen
  Next
  
  For i = FindLen-1 To 0 Step -1
    bp(PeekB(*FindMem+i)&$FF) = FindLen-i-1
  Next
  
  For *pos.Byte = *MainMem + FindLen-1 To *MainEnd
    While bp(*pos\b&$FF)
      *pos + bp(*pos\b&$FF)
      If *pos > *MainEnd
        Break 2
      EndIf
    Wend
    i = 0
    For *fpos.Byte = *FindMem + FindLen - 1 To *FindMem Step -1
      If PeekB(*pos+i) <> *fpos\b
        Break
      ElseIf *fpos = *FindMem
        ProcedureReturn *pos + i
      EndIf
      i - 1
    Next
  Next
  
  ProcedureReturn #Null
  
EndProcedure


len = 1024*100
*mem = AllocateMemory(len)
For i = 0 To len-1
  PokeB(*mem+i,Random(25))
Next

len2 = 3
*mem2 = AllocateMemory(len2)
For i = 0 To len2-1
  PokeB(*mem2+i,Random(25))
Next

#r = 1000

time1 = ElapsedMilliseconds()
For i = 0 To #r
  InMem(0,*mem,len,*mem2,len2)
Next
time1 = ElapsedMilliseconds() - time1

time2 = ElapsedMilliseconds()
For i = 0 To #r
  InMem2(*mem,len,*mem2,len2)
Next
time2 = ElapsedMilliseconds() - time2

r1 = InMem(0,*mem,len,*mem2,len2)
r2 = InMem2(*mem,len,*mem2,len2)
If r2: r2 - *mem+1:EndIf
MessageRequester("",Str(time1)+"ms ("+Hex(r1)+")"+#CRLF$+Str(time2)+"ms ("+Hex(r2)+")")
irc://irc.freenode.org/#purebasic
User avatar
pdwyer
Addict
Addict
Posts: 2813
Joined: Tue May 08, 2007 1:27 pm
Location: Chiba, Japan

Post by pdwyer »

I figured out the where the difference is with powerb and pureb performance is.

Its the strings they use. Powerb uses the bstr (Ole Strings) and pureb uses the ascii Z (null terminated) strings.

When I use a pureb type string in powerb I get similar results, when I pass them to ole strings I get a huge performance increase.

I'll try looking at the API ole string functions in pureb, it may give me the extra kick I'm looking for
Paul Dwyer

“In nature, it’s not the strongest nor the most intelligent who survives. It’s the most adaptable to change” - Charles Darwin
“If you can't explain it to a six-year old you really don't understand it yourself.” - Albert Einstein
Post Reply