Data base access : associative arrays

Share your advanced PureBasic knowledge/code with the community.
Ollivier
Enthusiast
Enthusiast
Posts: 281
Joined: Mon Jul 23, 2007 8:30 pm
Location: FR

Data base access : associative arrays

Post by Ollivier »

Hello, I update the title and add a little definition.

Discovering the term "Hash Map" or "Associative array" in these different threads:
Multi-pattern searching
Hash-Map vs LL
Associative arrays
...I try to find a code which calculates the fingerprint of a string. Actually Crc32 is a very good method but I think we could accelerate this operation.

So, I suggest a code I update these days.

>> In an associative array, when we want to declare, check or remove an element, we give the string to a fingerprint procedure (usually Crc32). This procedure loses time to calculate the fingerprint. If we are able to accelerate this calculation, we accelerate too the general data base access.

This code "digests" a message (= a null terminal string) cutting it, double word after double word (= a long variable). Each double word is "xored" (logical operation) with a global double word which is used as result of the procedure.

The last operation consists to reduce the double word (4 bytes) into a 24bits sized number (3 bytes). I extract the first byte (bits 24 to 31), "xor" it with the 3 others bytes and remove it.

Reading definition of Crc overview, my method is "archaic". But testing its speed and the little number of collisions I get, I think it's a possible choice.

So, I hope I am clear and I am sorry for the langage errors.
I'll add an little example.

Code: Select all

Procedure.L MD24(St.S)

   Protected Length.I
   Protected Result.L
   Protected Stamp.I
   Protected Dest.I

   Length = Len(St)
   St + "    " ; Allocate 4 bytes
   *Adr = @St
   PokeL(*Adr + Length, 0) ; reset these 4 bytes

   Dest = *Adr + Length - 1
   
   For I = *Adr To Dest Step 4
      Result ! PeekL(I)
   Next I

; last operation   
   Stamp = Result & $FF000000
   Stamp >> 8
   Result ! Stamp
   Stamp >> 8
   Result ! Stamp
   Stamp >> 8
   Result ! Stamp
   Result & $FFFFFF
   ProcedureReturn Result

EndProcedure

Structure CamX
   *Ptr[1 << 24]
EndStructure

   CamX.CamX
Last edited by Ollivier on Tue Mar 31, 2009 4:03 pm, edited 7 times in total.
User avatar
netmaestro
PureBasic Bullfrog
PureBasic Bullfrog
Posts: 8451
Joined: Wed Jul 06, 2005 5:42 am
Location: Fort Nelson, BC, Canada

Post by netmaestro »

I give up - what does it do?
BERESHEIT
Ollivier
Enthusiast
Enthusiast
Posts: 281
Joined: Mon Jul 23, 2007 8:30 pm
Location: FR

Post by Ollivier »

I update the first post, adding links in the forum I read to discover hashmap.
Last edited by Ollivier on Tue Mar 31, 2009 3:44 pm, edited 1 time in total.
Ollivier
Enthusiast
Enthusiast
Posts: 281
Joined: Mon Jul 23, 2007 8:30 pm
Location: FR

Post by Ollivier »

This code doesn't fail.
It is 4x faster than the native Crc32 statement with the debugger,
and 7x faster without the debugger.

But there is a problem with the repetition of the characters (certainly lots of collisions when there is lots of strings which are mainly composed of same characters - AAAAAAAAA or !!!!!!!!!!!!!!!!!!! or $$$$$$$$$$$$, etc...).
In others ways, it may be good.

Code: Select all

Macro Digest24(Adr,Length,Result)

; EAX, EBX, ESI, EDI and EBP are like 5 variables in PB
; Their format is 32 bits

; AL, BL are others variables
; but their format is 8 bits
; With more precision, 
;     AL = EAX & $FF
; And BL = EBX & $FF

!  XOR     EAX, EAX          ; EAX = 0
!  XOR     EBX, EBX          ; EBX = 0

!  MOV     ESI, [v_#Adr]     ; ESI = Adr
!  MOV     EDI, ESI          ; EDI = ESI (Copy)
!  ADD     ESI, [v_#Length]  ; ESI + Length
!  SUB     ESI, 4            ; ESI - 4
!  STD                       ; Indicates LODS statements
                             ; will DECREASE ESI
                             ; (<>  CLD <=> INCREASE ESI)
!  JMP     l_digestentry
                           
DigestLoop:                  ; Start loop 
!  LODSD                     ; EAX = PeekL(ESI): ESI - 4
!  XOR     EBX, EAX          ; EBX ! EAX
DigestEntry:
!  CMP     ESI, EDI          ; ESI > EDI ?
!  JG      l_digestloop      ; Yes, loop again
                             ; ESI < EDI ?
!  JL      l_digestcorrect   ; Do the correction
                             ; ESI = EDI ?
                             ; Do the last loop
!  LODSD                     ; EAX = PeekL(ESI): ESI - 4
!  XOR     EBX, EAX          ; EBX ! EAX
!  JMP     l_digestdone      ; Quit
                           
DigestCorrect:               ; Correction   
!  LODSB                     ; AL = PeekB(ESI): ESI - 1
!  XOR     BL,  AL           ; BL ! AL
!  MOV     EBP, ESI          ; EBP = ESI (copy)
!  SUB     EBP, EDI          ; EBP - EDI
!  CMP     EBP, -2           ; EBP = -2 ?
!  JZ      l_digestlastcorrect ; Yes, do the last correction
!  ROR     EBX, 8            ; Rotate EBX
!  LODSB                     ; AL = PeekB(ESI): ESI - 1
!  XOR     BL,  AL           ; BL ! AL

DigestLastCorrect:
!  ROR     EBX, 8            ; Rotate EBX
!  LODSB                     ; AL = PeekB(ESI): ESI - 1
!  XOR     BL,  AL           ; BL ! AL

DigestDone:                  ; Reduce from 32 to 24 bits

!  MOV     AL,  BL           ; Pick the first byte

!  ROR     EBX, 8            ; Take the second byte...
!  XOR     BL,  AL           ; And stamp it
!  ROR     EBX, 8            ; Take the third byte...
!  XOR     BL,  AL           ; And stamp it
!  ROR     EBX, 8            ; Take the fourth byte...
!  XOR     BL,  AL           ; And stamp it again
!  ROR     EBX, 16           ; Leave all
!  AND     EBX, 0x00FFFFFF   ; And free the first
!  MOV     [v_#Result], EBX  ; Result = EBX

!  CLD                       ; Clear Direction flag
EndMacro
SFSxOI
Addict
Addict
Posts: 2970
Joined: Sat Dec 31, 2005 5:24 pm
Location: Where ya would never look.....

Post by SFSxOI »

But....Maybe I missed something, but I just don't understand.....

1. What does it do?
2. What is it for?
3. Where would it be used?
4. How would it be used?
Ollivier
Enthusiast
Enthusiast
Posts: 281
Joined: Mon Jul 23, 2007 8:30 pm
Location: FR

Post by Ollivier »

Excuse me.

I add links in the first posts.

Ollivier
Post Reply