Here's a x64 asm experiment.Everything wrote:It's supposed to be a x64 version in a first place
It's harder to debug so I hope I didn't make any mistakes.
It gives at least an impression about the speed difference with a non asm version.
Levenshtein distance or OSA distance is also possible with low memory requirements by the way.
Code: Select all
Structure ByteArray
b.b[0]
EndStructure
Structure LCB
Aoffset.i
Boffset.i
Size.i
EndStructure
Procedure.i LongestCommonByteSequence(*A.ByteArray, ASize.i, *B.ByteArray, BSize.i, *Out.LCB)
If Not *A Or Not *B Or Not ASize Or Not BSize : ProcedureReturn -1 : EndIf
Protected *M = AllocateMemory(BSize*SizeOf(Integer))
Protected *AEnd = *A + ASize
Protected *BEnd = *B + BSize
!mov r8, [p.p_A] ; r8 = *A
!mov r10, [p.p_Out] ; r10 = *Out
!xor rax, rax
!mov [r10], rax ; *Out\Aoffset = 0
!mov [r10 + 8], rax ; *Out\BOffset = 0
!mov [r10 + 16], rax ; *Out\Size = 0
!.l0:
!mov r9, [p.p_B] ; r9 = *B
!mov r11, [p.p_M] ; r11 = *M
!movzx ecx, byte [r8] ; ecx = character to compare
!xor rdx, rdx ; rdx = Previous diagonal
!.l1:
!xor rax, rax
!cmp cl, [r9] ; compare characters
!jne .l2
!lea rax, [rdx + 1]
!cmp rax, [r10 + 16] ; check if we found bigger score
!jbe .l2
!mov [r10 + 16], rax ; set new score
!mov rdx, r9
!sub rdx, [p.p_B]
!mov [r10 + 8], rdx ; set Boffset
!mov rdx, r8
!sub rdx, [p.p_A]
!mov [r10], rdx ; set Aoffset
!.l2:
!mov rdx, [r11]
!mov [r11], rax
!add r11, 8
!add r9, 1
!cmp r9, [p.p_BEnd]
!jne .l1
!add r8, 1
!cmp r8, [p.p_AEnd]
!jne .l0
FreeMemory(*M)
ProcedureReturn *Out\Size ; LCB in bytes
EndProcedure