Page 1 of 1

Data base access : associative arrays

Posted: Mon Mar 30, 2009 1:00 am
by Ollivier
Hello, I update the title and add a little definition.

Discovering the term "Hash Map" or "Associative array" in these different threads:
Multi-pattern searching
Hash-Map vs LL
Associative arrays
...I try to find a code which calculates the fingerprint of a string. Actually Crc32 is a very good method but I think we could accelerate this operation.

So, I suggest a code I update these days.

>> In an associative array, when we want to declare, check or remove an element, we give the string to a fingerprint procedure (usually Crc32). This procedure loses time to calculate the fingerprint. If we are able to accelerate this calculation, we accelerate too the general data base access.

This code "digests" a message (= a null terminal string) cutting it, double word after double word (= a long variable). Each double word is "xored" (logical operation) with a global double word which is used as result of the procedure.

The last operation consists to reduce the double word (4 bytes) into a 24bits sized number (3 bytes). I extract the first byte (bits 24 to 31), "xor" it with the 3 others bytes and remove it.

Reading definition of Crc overview, my method is "archaic". But testing its speed and the little number of collisions I get, I think it's a possible choice.

So, I hope I am clear and I am sorry for the langage errors.
I'll add an little example.

Code: Select all

Procedure.L MD24(St.S)

   Protected Length.I
   Protected Result.L
   Protected Stamp.I
   Protected Dest.I

   Length = Len(St)
   St + "    " ; Allocate 4 bytes
   *Adr = @St
   PokeL(*Adr + Length, 0) ; reset these 4 bytes

   Dest = *Adr + Length - 1
   
   For I = *Adr To Dest Step 4
      Result ! PeekL(I)
   Next I

; last operation   
   Stamp = Result & $FF000000
   Stamp >> 8
   Result ! Stamp
   Stamp >> 8
   Result ! Stamp
   Stamp >> 8
   Result ! Stamp
   Result & $FFFFFF
   ProcedureReturn Result

EndProcedure

Structure CamX
   *Ptr[1 << 24]
EndStructure

   CamX.CamX

Posted: Mon Mar 30, 2009 1:25 am
by netmaestro
I give up - what does it do?

Posted: Mon Mar 30, 2009 4:24 am
by Ollivier
I update the first post, adding links in the forum I read to discover hashmap.

Posted: Mon Mar 30, 2009 6:34 am
by Ollivier
This code doesn't fail.
It is 4x faster than the native Crc32 statement with the debugger,
and 7x faster without the debugger.

But there is a problem with the repetition of the characters (certainly lots of collisions when there is lots of strings which are mainly composed of same characters - AAAAAAAAA or !!!!!!!!!!!!!!!!!!! or $$$$$$$$$$$$, etc...).
In others ways, it may be good.

Code: Select all

Macro Digest24(Adr,Length,Result)

; EAX, EBX, ESI, EDI and EBP are like 5 variables in PB
; Their format is 32 bits

; AL, BL are others variables
; but their format is 8 bits
; With more precision, 
;     AL = EAX & $FF
; And BL = EBX & $FF

!  XOR     EAX, EAX          ; EAX = 0
!  XOR     EBX, EBX          ; EBX = 0

!  MOV     ESI, [v_#Adr]     ; ESI = Adr
!  MOV     EDI, ESI          ; EDI = ESI (Copy)
!  ADD     ESI, [v_#Length]  ; ESI + Length
!  SUB     ESI, 4            ; ESI - 4
!  STD                       ; Indicates LODS statements
                             ; will DECREASE ESI
                             ; (<>  CLD <=> INCREASE ESI)
!  JMP     l_digestentry
                           
DigestLoop:                  ; Start loop 
!  LODSD                     ; EAX = PeekL(ESI): ESI - 4
!  XOR     EBX, EAX          ; EBX ! EAX
DigestEntry:
!  CMP     ESI, EDI          ; ESI > EDI ?
!  JG      l_digestloop      ; Yes, loop again
                             ; ESI < EDI ?
!  JL      l_digestcorrect   ; Do the correction
                             ; ESI = EDI ?
                             ; Do the last loop
!  LODSD                     ; EAX = PeekL(ESI): ESI - 4
!  XOR     EBX, EAX          ; EBX ! EAX
!  JMP     l_digestdone      ; Quit
                           
DigestCorrect:               ; Correction   
!  LODSB                     ; AL = PeekB(ESI): ESI - 1
!  XOR     BL,  AL           ; BL ! AL
!  MOV     EBP, ESI          ; EBP = ESI (copy)
!  SUB     EBP, EDI          ; EBP - EDI
!  CMP     EBP, -2           ; EBP = -2 ?
!  JZ      l_digestlastcorrect ; Yes, do the last correction
!  ROR     EBX, 8            ; Rotate EBX
!  LODSB                     ; AL = PeekB(ESI): ESI - 1
!  XOR     BL,  AL           ; BL ! AL

DigestLastCorrect:
!  ROR     EBX, 8            ; Rotate EBX
!  LODSB                     ; AL = PeekB(ESI): ESI - 1
!  XOR     BL,  AL           ; BL ! AL

DigestDone:                  ; Reduce from 32 to 24 bits

!  MOV     AL,  BL           ; Pick the first byte

!  ROR     EBX, 8            ; Take the second byte...
!  XOR     BL,  AL           ; And stamp it
!  ROR     EBX, 8            ; Take the third byte...
!  XOR     BL,  AL           ; And stamp it
!  ROR     EBX, 8            ; Take the fourth byte...
!  XOR     BL,  AL           ; And stamp it again
!  ROR     EBX, 16           ; Leave all
!  AND     EBX, 0x00FFFFFF   ; And free the first
!  MOV     [v_#Result], EBX  ; Result = EBX

!  CLD                       ; Clear Direction flag
EndMacro

Posted: Mon Mar 30, 2009 2:46 pm
by SFSxOI
But....Maybe I missed something, but I just don't understand.....

1. What does it do?
2. What is it for?
3. Where would it be used?
4. How would it be used?

Posted: Mon Mar 30, 2009 10:50 pm
by Ollivier
Excuse me.

I add links in the first posts.

Ollivier