similar string comparison

Just starting out? Need help? Post your questions and find answers here.
ALAN-MHz
User
User
Posts: 68
Joined: Fri Jul 29, 2005 11:47 am

similar string comparison

Post by ALAN-MHz »

dear all, i need help, i have an array of about 3000 phrases, i need to find phrases similarities and show in debug both phrases and % of similarities, i don't have any idea of i can code this, anyone can help me ?

example

this is a book
this is book

those two phrases i need to show with % of similarities
IdeasVacuum
Always Here
Always Here
Posts: 6426
Joined: Fri Oct 23, 2009 2:33 am
Location: Wales, UK
Contact:

Re: similar string comparison

Post by IdeasVacuum »

using a checksum?
IdeasVacuum
If it sounds simple, you have not grasped the complexity.
User avatar
TI-994A
Addict
Addict
Posts: 2740
Joined: Sat Feb 19, 2011 3:47 am
Location: Singapore
Contact:

Re: similar string comparison

Post by TI-994A »

ALAN-MHz wrote:...i have an array of about 3000 phrases, i need to find phrases similarities and show in debug both phrases and % of similarities
example
this is a book
this is book
Perhaps the Levenshtein Distance might serve your needs. It doesn't express the similarities in *percentages, but rather in ascending values:

Code: Select all

;============================================
;
;  a simple Levenshtein string comparator
;  the ReceiveHTTPFile() function to
;  download a sample text file from DropBox
;
;  requires PureBasic v5.40 minimum for 
;  ReceiveHTTPFile() to work with DropBox
;
;============================================

#maxStrLen = 30

Structure levSort
  score.i
  phrases.s
EndStructure

Procedure LevenshteinDistance(s1.s, s2.s) 
  Protected s1Len = Len(s1) 
  Protected s2Len = Len(s2) 
  Dim levArray(s1Len, s2Len)
  
  If s1Len > s2Len
    percentile = s1Len
  Else
    percentile = s2Len
  EndIf
  
  If s1Len And s2Len 
    If s1Len > #maxStrLen 
      s1Len = #maxStrLen 
    EndIf 
    If s2Len > #maxStrLen 
      s2Len = #maxStrLen 
    EndIf 
    For i = 0 To s1Len 
      levArray(i, 0) = i 
    Next 
    For j = 0 To s2Len 
      levArray(0, j) = j 
    Next 
    For i = 1 To s1Len 
      For j = 1 To s2Len 
        If Mid(s1, i, 1) = Mid(s2, j, 1) 
          cost = #False 
        Else 
          cost = #True 
        EndIf 
        a = levArray(i - 1, j) + 1
        b = levArray(i, j - 1) + 1
        c = levArray(i - 1, j - 1) + cost
        If b < a 
          a = b 
        EndIf 
        If c < a 
          a = c 
        EndIf 
        levArray(i, j) = a
      Next 
    Next 
    result = (percentile - levArray(s1Len, s2Len)) / (percentile * 0.01)
  Else 
    result = 9999 
  EndIf 
  ProcedureReturn result 
EndProcedure

InitNetwork()
fileName$ = GetTemporaryDirectory() + "similarStrings.txt"
dropBoxLink$ = "https://www.dropbox.com/s/hrdxobhye8eks8d/similarStrings.txt?dl=1"

If FileSize(fileName$) = -1
  ReceiveHTTPFile(dropBoxLink$, fileName$)
EndIf

If ReadFile(0, fileName$)
  
  Dim s$(90)
  Dim c$(90)
  Dim results.levSort(8099)
  
  While Not Eof(0)
    s$(i) = ReadString(0)
    i + 1
  Wend
  
  CloseFile(0)
  
  CopyArray(s$(), c$())
  
  For i = 0 To 89
    For ii = 0 To 89
      result = LevenshteinDistance(s$(i), c$(ii))
      results(sno)\score = result
      results(sno)\phrases = s$(i) + " >> " + c$(ii) + " == " + Str(result) + "%"
      sno + 1
    Next ii
  Next i
  
  SortStructuredArray(results(), #PB_Sort_Descending, OffsetOf(levSort\score), #PB_Integer)
  
  wFlags = #PB_Window_SystemMenu | #PB_Window_ScreenCentered 
  OpenWindow(#PB_Any, #PB_Ignore, #PB_Ignore, 600, 600, "Levenshtein Comparator", wFlags)
  listDisplay = ListViewGadget(#PB_Any, 10, 10, 580, 580)
  
  resultsFile.s = SaveFileRequester("Save results to:", "", "Text (*.txt)|*.txt", 0)
  If resultsFile <> "" 
    If GetExtensionPart(resultsFile) = ""
      resultsFile + ".txt"
    EndIf
    CreateFile(0, resultsFile)
  EndIf
  
  For i = 0 To ArraySize(results())
    AddGadgetItem(listDisplay, -1, Str(i + 1) + ". " + results(i)\phrases)
    If IsFile(0)
      WriteStringN(0, Str(i + 1) + ". " + results(i)\phrases)
    EndIf
  Next i
  
  If IsFile(0)
    CloseFile(0)
  EndIf
  
  While WaitWindowEvent() ! #PB_Event_CloseWindow : Wend 
  
EndIf
The code uses a sample file of ninety similar strings (downloaded from DropBox), and compares them recursively. The results, sorted according to score, are then displayed, and optionally written to file.

Hope it might help in some way. :wink:
EDITS wrote:4th September 2016: integrated Lunasole's solution to return percentage-based results
18th February 2019: updated download links
Last edited by TI-994A on Mon Feb 18, 2019 8:18 am, edited 3 times in total.
Texas Instruments TI-99/4A Home Computer: the first home computer with a 16bit processor, crammed into an 8bit architecture. Great hardware - Poor design - Wonderful BASIC engine. And it could talk too! Please visit my YouTube Channel :D
User avatar
Lunasole
Addict
Addict
Posts: 1091
Joined: Mon Oct 26, 2015 2:55 am
Location: UA
Contact:

Re: similar string comparison

Post by Lunasole »

@TI-994A, thank for interesting example.

Btw, it can be easily turned to return %

Code: Select all

;============================================
;
;  a simple Levenshtein string comparator
;  algorithm code by DoubleDutch
;
;============================================

#maxStrLen = 30

Procedure.f LevenshteinDistance(s1.s, s2.s) 
	Protected s1Len = Len(s1) 
	Protected s2Len = Len(s2) 
	Protected.f result, MaxPc
		If s1Len > s2Len ; or switch here ">" to "<" ^^
			MaxPc = s1Len
		Else
			MaxPc = s2Len
		EndIf
	Dim levArray(s1Len, s2Len)

	
	If s1Len And s2Len 
		If s1Len > #maxStrLen 
			s1Len = #maxStrLen 
		EndIf 
		If s2Len > #maxStrLen 
			s2Len = #maxStrLen 
		EndIf 
		For i = 0 To s1Len 
			levArray(i, 0) = i 
		Next 
		For j = 0 To s2Len 
			levArray(0, j) = j 
		Next 
		For i = 1 To s1Len 
			For j = 1 To s2Len 
				If Mid(s1, i, 1) = Mid(s2, j, 1) 
					cost = #False 
				Else 
					cost = #True 
				EndIf 
				a = levArray(i - 1, j) + 1
				b = levArray(i, j - 1) + 1
				c = levArray(i - 1, j - 1) + cost
				If b < a 
					a = b 
				EndIf 
				If c < a 
					a = c 
				EndIf 
				levArray(i, j) = a
			Next 
		Next 
		result = levArray(s1Len, s2Len)
		result = (MaxPc - result) / (MaxPc * 0.01)
	Else 
		result = 0
	EndIf 
	ProcedureReturn result 
EndProcedure


; result is in %
Debug LevenshteinDistance("this is a book", "this is book")
Debug LevenshteinDistance("this is phrase 1", "this is 2 phrase")
Debug LevenshteinDistance("this is phrase 1", "this is phrase 2")
Debug LevenshteinDistance("this is phrase 1", "this is phrase 1")
"W̷i̷s̷h̷i̷n̷g o̷n a s̷t̷a̷r"
User avatar
TI-994A
Addict
Addict
Posts: 2740
Joined: Sat Feb 19, 2011 3:47 am
Location: Singapore
Contact:

Re: similar string comparison

Post by TI-994A »

Lunasole wrote:Btw, it can be easily turned to return %...
Nice Lunasole! Seems to work as expected; didn't look at it that way.

I've integrated it into my example above. :D
Texas Instruments TI-99/4A Home Computer: the first home computer with a 16bit processor, crammed into an 8bit architecture. Great hardware - Poor design - Wonderful BASIC engine. And it could talk too! Please visit my YouTube Channel :D
davido
Addict
Addict
Posts: 1890
Joined: Fri Nov 09, 2012 11:04 pm
Location: Uttoxeter, UK

Re: similar string comparison

Post by davido »

@TI-994A
@Lunasole

Very nice examples.
I was wondering how it might be done. Thanks :D
DE AA EB
ALAN-MHz
User
User
Posts: 68
Joined: Fri Jul 29, 2005 11:47 am

Re: similar string comparison

Post by ALAN-MHz »

thanks to all for effort!
User avatar
Michael Vogel
Addict
Addict
Posts: 2807
Joined: Thu Feb 09, 2006 11:27 pm
Contact:

Re: similar string comparison

Post by Michael Vogel »

I'm using the (slightly) tuned code below because I want to check for duplicate files on my harddisk - but the routine is quite slow. Defining the matrix as a global array just saves microseconds, but I would need a real speed boost? W...?

Code: Select all

Procedure Distance1(s1.s,s2.s)

	Protected a,b,c
	Protected i,j,m,n
	Protected m1,m2
	Protected z

	m=Len(s1)
	n=Len(s2)

	#MaxWordLength = 100

	If m And n
		If m>#MaxWordLength
			m=#maxWordLength
		EndIf
		If n>#MaxWordLength
			n=#maxWordLength
		EndIf

		Dim DistanceMatrix(m,n)
		For i=0 To m
			DistanceMatrix(i,0)=i
		Next
		For j=0 To n
			DistanceMatrix(0,j)=j
		Next

		m1=@s1-1
		m2=@s2-1

		For i=1 To m
			For j=1 To n
				If PeekA(m1+i)=PeekA(m2+j)
					;If Mid(s1,i,1)=Mid(s2,j,1)
					z=#False
				Else
					z=#True
				EndIf
				a=DistanceMatrix(i-1,j)+1;  	deletion
				b=DistanceMatrix(i,j-1)+1;  	insertion
				c=DistanceMatrix(i-1,j-1)+z;	substitution
				If b<a
					a=b
				EndIf
				If c<a
					a=c
				EndIf
				DistanceMatrix(i,j)=a
			Next
		Next

		z=DistanceMatrix(m,n);				absolute
		If m>n
			n=m
		EndIf
		z=(n-z)*100/n;						relative

	ElseIf m=n
		z=#Null

	Else
		z=100;								or #MaxWordLength for absolute

	EndIf

	ProcedureReturn z

EndProcedure
User avatar
Lunasole
Addict
Addict
Posts: 1091
Joined: Mon Oct 26, 2015 2:55 am
Location: UA
Contact:

Re: similar string comparison

Post by Lunasole »

@Michael Vogel, the algorithm is slow enough itself. Also as for me this one badly goes with searching duplicates., but probably you can make it noticeably faster if do it multi-threaded (also rewriting in ASM surely will bring some speed).
"W̷i̷s̷h̷i̷n̷g o̷n a s̷t̷a̷r"
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3942
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: similar string comparison

Post by wilbert »

It helps a bit to use pointers instead of peek.

Code: Select all

Procedure Distance1a(s1.s,s2.s)
  
  Protected a,b,c
  Protected i,j,m,n
  Protected *m1.Ascii,*m2.Ascii,m1a.a
  Protected z
  
  m=Len(s1)
  n=Len(s2)
  
  #MaxWordLength = 100
  
  If m And n
    If m>#MaxWordLength
      m=#maxWordLength
    EndIf
    If n>#MaxWordLength
      n=#maxWordLength
    EndIf
    
    Protected Dim DistanceMatrix(m,n)
    For i=0 To m
      DistanceMatrix(i,0)=i
    Next
    For j=0 To n
      DistanceMatrix(0,j)=j
    Next
    
    *m1=@s1
    For i=1 To m
      m1a = *m1\a
      *m2=@s2
      For j=1 To n
        If m1a=*m2\a
          ;If Mid(s1,i,1)=Mid(s2,j,1)
          z=#False
        Else
          z=#True
        EndIf
        a=DistanceMatrix(i-1,j)+1;     deletion
        b=DistanceMatrix(i,j-1)+1;     insertion
        c=DistanceMatrix(i-1,j-1)+z;   substitution
        If b<a
          a=b
        EndIf
        If c<a
          a=c
        EndIf
        DistanceMatrix(i,j)=a
        *m2+SizeOf(Character)
      Next
      *m1+SizeOf(Character)
    Next
    
    z=DistanceMatrix(m,n);            absolute
    If m>n
      n=m
    EndIf
    z=(n-z)*100/n;                  relative
    
  ElseIf m=n
    z=#Null
    
  Else
    z=100;                        or #MaxWordLength for absolute
    
  EndIf
  
  ProcedureReturn z
  
EndProcedure
Windows (x64)
Raspberry Pi OS (Arm64)
User avatar
Andre
PureBasic Team
PureBasic Team
Posts: 2139
Joined: Fri Apr 25, 2003 6:14 pm
Location: Germany (Saxony, Deutscheinsiedel)
Contact:

Re: similar string comparison

Post by Andre »

Looks interesting. :D
Seems to have the potential for using it in a searchfunction too, when similar searchresults should be noticed.

Or do you know other algorithms, which are used for such cases?
Bye,
...André
(PureBasicTeam::Docs & Support - PureArea.net | Order:: PureBasic | PureVisionXP)
User avatar
J. Baker
Addict
Addict
Posts: 2181
Joined: Sun Apr 27, 2003 8:12 am
Location: USA
Contact:

Re: similar string comparison

Post by J. Baker »

A little compare procedure. Not sure of the speed as compared to wilbert's or any of the others above. ;)

EDIT: Updated code.

Code: Select all

;PureBasic 5.43

Procedure.s PhraseMatch(Phrase.s, ComparePhrase.s)
  
  If Phrase <> ComparePhrase
  
    CleanPhrase.s = RemoveString(Phrase, ",")
    CleanPhrase.s = RemoveString(CleanPhrase, ".")
    CleanPhrase.s = RemoveString(CleanPhrase, "?")
    CleanPhrase.s = RemoveString(CleanPhrase, "!")
    
    Word.s = ""
    WordMatch.i = 0
    PhraseLength.i = CountString(CleanPhrase, " ") + 1
    StringCheck.i = 0
    MatchingWords.s = ""
    
    For SplitPhrase = 1 To PhraseLength
      Word = StringField(CleanPhrase, SplitPhrase, " ")
      If SplitPhrase = PhraseLength
        StringCheck = FindString(ComparePhrase, Word)
      Else
        StringCheck = FindString(ComparePhrase, Word + " ")
      EndIf
      If StringCheck
        MatchingWords = MatchingWords + Word + Chr(44) + Chr(32)
        ComparePhrase = RemoveString(ComparePhrase, Word)
        WordMatch + 1
      EndIf
    Next
    
    Percentage.f = 100 / PhraseLength
    ProcedureReturn Str(Percentage * WordMatch) + Chr(37) + " - "  + RTrim(RTrim(MatchingWords, " "), ",")
  
  Else
    
    ProcedureReturn "100" + Chr(37) + " - " + Phrase
    
  EndIf
  
EndProcedure

PhraseA.s = "Sally sells sea shells at the sea shore."
PhraseB.s = "Sally paints sea shells and sells them."

Debug PhraseMatch(PhraseA, PhraseB)
www.posemotion.com

PureBasic Tools for OS X: PureMonitor, plist Tool, Data Maker & App Chef


Even the vine knows it surroundings but the man with eyes does not.
User avatar
Lord
Addict
Addict
Posts: 907
Joined: Tue May 26, 2009 2:11 pm

Re: similar string comparison

Post by Lord »

Hi Wilbert!
wilbert wrote:It helps a bit to use pointers instead of peek.

Code: Select all

Procedure Distance1a(s1.s,s2.s)
  
  Protected a,b,c
  Protected i,j,m,n
  Protected *m1.Ascii,*m2.Ascii,m1a.a
  Protected z
  
  m=Len(s1)
  n=Len(s2)
  
  #MaxWordLength = 100
  
  If m And n
    If m>#MaxWordLength
      m=#maxWordLength
    EndIf
    If n>#MaxWordLength
      n=#maxWordLength
    EndIf
    
    Protected Dim DistanceMatrix(m,n)
    For i=0 To m
      DistanceMatrix(i,0)=i
    Next
    For j=0 To n
      DistanceMatrix(0,j)=j
    Next
    
    *m1=@s1
    For i=1 To m
      m1a = *m1\a
      *m2=@s2
      For j=1 To n
        If m1a=*m2\a
          ;If Mid(s1,i,1)=Mid(s2,j,1)
          z=#False
        Else
          z=#True
        EndIf
        a=DistanceMatrix(i-1,j)+1;     deletion
        b=DistanceMatrix(i,j-1)+1;     insertion
        c=DistanceMatrix(i-1,j-1)+z;   substitution
        If b<a
          a=b
        EndIf
        If c<a
          a=c
        EndIf
        DistanceMatrix(i,j)=a
        *m2+SizeOf(Character)
      Next
      *m1+SizeOf(Character)
    Next
    
    z=DistanceMatrix(m,n);            absolute
    If m>n
      n=m
    EndIf
    z=(n-z)*100/n;                  relative
    
  ElseIf m=n
    z=#Null
    
  Else
    z=100;                        or #MaxWordLength for absolute
    
  EndIf
  
  ProcedureReturn z
  
EndProcedure
Sorry, but your code yields 100% in this example:
Debug Distance1a("Test", "")
So each empty string gives 100%. :shock:
Image
User avatar
Michael Vogel
Addict
Addict
Posts: 2807
Joined: Thu Feb 09, 2006 11:27 pm
Contact:

Re: similar string comparison

Post by Michael Vogel »

Lord wrote:Sorry, but your code yields 100% in this example:
Debug Distance1a("Test", "")
So each empty string gives 100%. :shock:
My fault,
you must change m=n to m-n or swap the z=0 with the z=100 line at the end of the code.
Sooraa
User
User
Posts: 48
Joined: Thu Mar 12, 2015 2:07 pm
Location: Germany

Re: similar string comparison

Post by Sooraa »

Andre, may be for a searchfunction it's interesting to have code with small footprint and fast execution for big data apps:

Here our performance study for Damerau-Levenshtein in ASM:

Code: Select all

CompilerIf #PB_Compiler_Processor <> #PB_Processor_x86 And #PB_Compiler_Processor <> #PB_Processor_x64 : CompilerError "This works on x86 or x64 processors" : CompilerEndIf
CompilerSelect #PB_Compiler_Processor : CompilerCase #PB_Processor_x64 : CompilerEndSelect : OpenConsole() : EnableExplicit
Define s1.s, s2.s, s1l.l, s2l.l, t, tableL.l, time1.l, time2.l, *s1p, *s2p, *s1l, *s2l, levi.i
s1 = "meilenstein23456"                                           ; max length = 255
s2 = "levenshtein23456levenshtein23456levenshtein23456levenshtein23456levenshtein23456levenshtein23456levenshtein23456levenshtein23456levenshtein23456levenshtein23456levenshtein23456levenshtein23456levenshtein23456levenshtein23456levenshtein23456levenshtein2345" ; s1 ist r9 und muss als ZeilenIncrement der kürzere String sein. geht aber auch bis Länge 255
;s2 = "levenshtien23456levenshtien23456levenshtien23456levenshtien23456levenshtien23456levenshtien23456levenshtien23456levenshtien23456levenshtien23456levenshtien23456levenshtien23456levenshtien23456levenshtien23456levenshtien23456levenshtien23456levenshtien2345"; s2 könnte zwar "endlos" werden (ecx), aber stept über ecx auch über cl hinweg, also max len = 255
s1l = Len(s1) : s2l = Len(s2) : tableL = s1l * s2l : *s1p = @s1 : *s2p = @s2 : *s1l = @s1l : *s2l = @s2l
t = AllocateMemory(tableL) : levi = AllocateMemory(8) ;: Delay(15000) 
; The cost of Damerau-Levenshtein distance is O(mn). Cost is not reducable using SSE instructions but through utilization of x86es 3-4 ALUs per core, instruction fusion, 
; cache awareness and other CPU-specifics in superscalar processors. Hyperthreading is of no advantage since there are no more ALUs and should be switched off.)
PrintN("lenght string s1: " + Str(s1l) + #CRLF$ + "lenght string s2: " + Str(s2l)) : PrintN("DamLev table size: " + Str(tableL)) : Time1 = ElapsedMilliseconds()

!       align 8 
!       db  0fh, 1fh, 00h ;, 0fh, 1fh, 00h, 0fh, 1fh, 00h, 90h, 90h      ; align der Kernroutine bringt seeehhhr viel! mit Damerau jo 359 ms
!       push rbx rbp rsi rdi 
!       mov rbp, [v_t]                                           
!       xor rcx, rcx                                               ; das ist der Hammer: danach erinnert sich die ALU, dass dar High-Part leer ist!!!
!       xor rax, rax
!       xor r14, r14                                                ; reset for unten iteration counter
!       mov ecx, [v_s1l]
!MOV r11d, [v_s2l]
!MOV r12d, [v_s1l]
!     @@:                                                         ; Spaltennummern (horizontal) vorbesetzen
!       mov [rbp + rcx], cl                                       ; das ist richtig, da die Zeile mit 1-byte Zahlen vorbesetzt wird, geht geht dann jedoch nur bis zu einer s1-Größe von 256
!       sub cl, 1
!     jns @b
!       mov eax, [v_s1l]
!       add eax, 1                                                ; add 1 for mulipl.
!       mov r9d, eax                                              ; [v_s1l]
!       mov ecx, [v_s2l]                                          ; s2l into ecx   next line: ! MOVsx r9, word[v_s2l]
!       imul eax, ecx                                             ; al wird größer als 255, also muss ein größeres Register her = eax
!     ne:
!       mov [rbp + rax], cl                                       ; Zeilennummern (senkrecht) vorbesetzten
!       sub eax, [v_s1l]
!       sub eax, 1
!       sub ecx, 1                                                ; hier sub genommen anstatt dec wegen 2 zu 3 byte instruction und flag stabilty
!     jnz ne                                                      ; lea ist noch so eine align instruction und 3 byte nop
!     start:
!       MOV rsi, [p_s1p]                                          ; adresse von String1
!       MOV rdi, [p_s2p]                                          ; adresse von String2
!       xor rcx, rcx                                              ; ist wichtig
!       xor rax, rax
; --------------------------------------------------------------- outer loop für Levi und Damerau -------------------------------------------
;!                                                                db  0fh, 1fh, 00h ; 90h, 90h     ; rp1 besser aligned (z.B 8) als rp2 ist schneller
!  rp1:                                                   ; Macro fusion wird in 64 bit mode supportet Nehalem hat 3 simple + 1 complex instr. decoder       
!       xor r10, r10                                              ; r10 macht rdx frei. ---> rdx, r11, r12, r13, (r14) und r15 sind noch frei !
!       lea r8, [rbp + rcx]                                       ; r8 ist tableoffset. rcx ist position step der linken vertikalen Zahlenreihe
; --------------------------------------------------------------- "leviCost aus den 2 aktuellen Charposis in den Strings ermitteln" -----------------------------------------
;                                                                 db 0fh, 1fh, 00h, 0fh, 1fh, 00h, 90h,90h   ; rp1 ist 8 und rp2 16-aligned
!  rp2:                                                           ; rp2 ist eine lange und starke dependency chain, 90 Bytes lang, nicht mehr zu optimieren              
!       mov al, [rsi + rcx]                                       ; Char aus s1
!       cmp al, [rdi + r10]                                       ; Check s1 gegen Char aus s2	; cmp does not break dependency on Intel core!                                                                
                                                                  ; ! setnz al kann bei Damerau hier weg, macht dependency  
                                                                  ; al <- cost aus dem Vergleich der beiden Character aus s1 und s2 mit -i und -j 
!     jmp DamLevCost                                     ; jmp ist viel schneller ! ; jmp and jz are same instruction lenght ; jmp = no Damerau Function  ; jz = if LeviCost = Null, jump over Damerau
!       MOV bl, [rsi + rcx]                                       ; Char aus s1
!       cmp bl, [rdi + r10 - 1]                                   ; Check s1 gegen Char aus s2 - 1
!     jnz DamLevCost                                              ; CMP is able to macrofuse mit jcc
!       mov bl, [rsi + rcx - 1]                                   ; Char aus s2 - 1
!       cmp bl, [rdi + r10]                                       ; Check s2 gegen Char aus s1 - 1
!                                   db 0fh, 1fh, 00h, 90h, 90h    ; 16 aligned, 64 align bringt nichts
!     DamLevCost:
!       setnz al                                                  ; zero flag aus S1 und s2 oder s2 - 1 und s1 - 1
; --------------------------------------------------------------- 1. cmp in Table ------------------------
!       add al, [r8]                                     ; oben links. add rax ist extrem langsam
!       CMP AL, [r8 + 1]                                 ; oben
!     JBE no                                                      ; cmp & jbe are macro fused rule 19! 
!       mov al, [r8 + 1]                                          ; mov byte r10b, [r9 + 1 * rbx] ; geht auch!
!       ADD AL, 1  
!     no:
; --------------------------------------------------------------- 2. cmp in Table ------------------------
!       CMP AL, [r8 + r9]                                ; links r9 ist [v_s1l] incremented by 1
!     JBE @f
!       MOV AL, [r8 + r9]
!       add AL, 1
!     @@: 
!       mov byte[r8 + r9 + 1], al                        ; rechts unten write deswegen brauchen wir r8
; --------------------------------------------------------------- Ende DamLev per Cell, jump to next Cell -----------------------------------------------------------------------
!       add r10d, 1                                               ; r10 ist nächste Stelle innerhalb der Zeile nur zum cost match j++     
!       add r8d, r9d                                              ; step zur nächsten Zeile, the offset of the table
!       cmp r10d, r11d ;[v_s2l]                                   ;  j < n  ; gegenüber [v_s2l] bringt das 10 Sek.  von 11 Sek.
!     jnz rp2                                                     ; branch to inner loop
!       add ecx, 1                                                ; nächste Spalte i++
!       cmp ecx, r12d ;[v_s1l]                                    ; value of v_s1l ;[v_s1l] bringt noch mal 0,3 Sek von 10 Sek.
!     jnz rp1                                                     ; diese loops zu unrollen bringt nichts
; =============================================================== Ende Outer Loop =============================================================================
!       add r14, 1                                                ; add, cmp und jnz mit r14 werden im echten code nicht gebraucht
!       cmp r14, 1000000                                         ; just iteration counter for throughput measurement. r15 ging hier nicht !!!!!!
!     jnz start 
!       mov r14, [v_levi]                                         ; adresse von levi --> r15 ; *levi oder p_levi
!       mov [r14], rax                                            ; schreibe rax an adresse von v_levi, weil ich extra keine variable definiert habe, sondern einen MemoryBereich
!       pop rdi rsi rbp rbx

Time2 = ElapsedMilliseconds() - Time1
PrintN("DamLev-distance: " + Str(PeekL(levi)))
PrintN("Duration in ms at 1 Mio. iterations in r14: " + Str(Time2))
; Means approx. 135.000 Levs per second with distance 240 on a 3,5 Ghz i5-CPU. If distance is realisticly lower (f.e. 4) 1.250.000 Levs / s per core.)
; Means approx. 135.000 Levs per second with distance 240 on a 4,0 Ghz Sandy-CPU. If distance is realisticly lower (f.e. 4) 2.272.000 Levs / s per core.)
Input()
End

Post Reply