Removed
-
- User
- Posts: 36
- Joined: Fri Feb 23, 2007 8:00 pm
- Location: New York
Very cool. works like a charm.
Robert and robert gives 1, Robert and roberT gives 2
Here's the code in formatted fashion for those who find it hard to read the unformatted kind
Robert and robert gives 1, Robert and roberT gives 2

Here's the code in formatted fashion for those who find it hard to read the unformatted kind

Code: Select all
Procedure.l Minimum(a.l,b.l,c.l)
;*******************************
;*** Get minimum of three values
;*******************************
mi.l=0
mi = a
If b < mi
mi = b
EndIf
If c < mi
mi = c
EndIf
ProcedureReturn mi
EndProcedure
Procedure.l Levenshtein(s$,t$)
;********************************
;*** Compute Levenshtein Distance
;********************************
m.l = 0; length of t
n.l = 0;length of s
i.l=0 ; iterates through s
j.l=0 ; iterates through t
s_i.s ="" ; ith character of s
t_j.s = ""; jth character of t
cost.l=0 ; cost
; Step 1
n = Len(s$)
m = Len(t$)
Dim d(n,m); matrix
If n = 0
LD = m
ProcedureReturn ld
EndIf
If m = 0
LD = n
ProcedureReturn ld
EndIf
; Step 2
For i = 0 To n
d(i, 0) = i
Next
For j = 0 To m
d(0, j) = j
Next
;Step 3
For i = 1 To n
s_i = Mid(s$, i, 1)
; Step 4
For j = 1 To m
t_j = Mid(t$, j, 1)
; Step 5
If s_i = t_j
cost = 0
Else
cost = 1
EndIf
; Step 6
d(i, j) = Minimum(d(i - 1, j) + 1, d(i, j - 1) + 1, d(i - 1, j - 1) + cost)
Next
Next
; Step 7
LD = d(n, m)
ProcedureReturn ld
EndProcedure
Debug Levenshtein("Robert","roberT")
I Stepped On A Cornflake!!! Now I'm A Cereal Killer!
-
- User
- Posts: 36
- Joined: Fri Feb 23, 2007 8:00 pm
- Location: New York
-
- User
- Posts: 36
- Joined: Fri Feb 23, 2007 8:00 pm
- Location: New York
Re: How long is a string
Dr_Wildrick wrote:I have an ex-wife and two teenage daughters who makes sure I am aware of that fact on an a regular basis.)

Thanks for info.

Dare2 cut down to size
-
- Enthusiast
- Posts: 767
- Joined: Sat Jan 24, 2004 6:56 pm
Aahhhh, this takes me back to 1991, when I published an article in the C/C++ Users Journal (May issue, Vol.9/Nr.5) called "Inexact Alphanumeric Comparisons". The code I presented in that article used the Levenstein distance as well.
Perhaps it's time to dust off a more recent iteration of that old C-code and port it to PureBasic...
Perhaps it's time to dust off a more recent iteration of that old C-code and port it to PureBasic...
- DoubleDutch
- Addict
- Posts: 3220
- Joined: Thu Aug 07, 2003 7:01 pm
- Location: United Kingdom
- Contact:
This is the distance routine I wrote for Elementary Reports's spell checker...
I got the algorithm for the routine from Wikipedia:
http://en.wikipedia.org/wiki/Levenshtein_distance
Please note: The 9999 value is there because it works well with the spell checker, it should really be the greater length of the two strings (which is exactly how many chages it would take for a null string compared with a non-null string.
Also note: DistanceMatrix[MaxWordLength,MaxWordLength] should be set up as a global array where MaxWordLength is the max word length!
Code: Select all
Procedure Distance(s$,t$)
m=Len(s$)
n=Len(t$)
If m And n
If m>#MaxWordLength
m=#maxWordLength
EndIf
If n>#MaxWordLength
n=#maxWordLength
EndIf
For i=0 To m
DistanceMatrix(i,0)=i
Next
For j=0 To n
DistanceMatrix(0,j)=j
Next
For i=1 To m
For j=1 To n
If Mid(s$,i,1)=Mid(t$,j,1)
cost=#False
Else
cost=#True
EndIf
a=DistanceMatrix(i-1,j)+1 ; deletion
b=DistanceMatrix(i,j-1)+1 ; insertion
c=DistanceMatrix(i-1,j-1)+cost ; substitution
If b<a
a=b
EndIf
If c<a
a=c
EndIf
DistanceMatrix(i,j)=a
Next
Next
result=DistanceMatrix(m,n)
Else
result=9999
EndIf
ProcedureReturn result
EndProcedure
http://en.wikipedia.org/wiki/Levenshtein_distance
Please note: The 9999 value is there because it works well with the spell checker, it should really be the greater length of the two strings (which is exactly how many chages it would take for a null string compared with a non-null string.
Also note: DistanceMatrix[MaxWordLength,MaxWordLength] should be set up as a global array where MaxWordLength is the max word length!

https://deluxepixel.com <- My Business website
https://reportcomplete.com <- School end of term reports system
https://reportcomplete.com <- School end of term reports system
Playing with the code here I was expected to be disapointed for some reason but I was pleasently surprised 
I modified the original code to be faster using memory buffers, and it lets it handle pointers to file bytes. (and made the vars easier to recognise
)
Even so, doing a lot of comparisons could be time consuming. If you have a lot of students with a large report it's going to take a while to number crunch. Anyway, here's what I have, thanks for introducing me to the algorithm (I'll look at doubledutch's one next

I modified the original code to be faster using memory buffers, and it lets it handle pointers to file bytes. (and made the vars easier to recognise

Even so, doing a lot of comparisons could be time consuming. If you have a lot of students with a large report it's going to take a while to number crunch. Anyway, here's what I have, thanks for introducing me to the algorithm (I'll look at doubledutch's one next

Code: Select all
Procedure.s MidMem(*MainMem, StartPos.l, GetLen.l)
If *MainMem = 0 Or GetLen = 0
ProcedureReturn ""
EndIf
*RetMem = AllocateMemory(GetLen)
CopyMemory(*MainMem + StartPos -1, *RetMem, GetLen)
ReturnString.s = PeekS(*RetMem,GetLen)
FreeMemory(*RetMem)
ProcedureReturn ReturnString
EndProcedure
Procedure.l Minimum(a.l,b.l,c.l)
;*******************************
;*** Get minimum of three values
;*******************************
mi.l=0
mi = a
If b < mi
mi = b
EndIf
If c < mi
mi = c
EndIf
ProcedureReturn mi
EndProcedure
Procedure.l Levenshtein(pStr1.l, StrLen1.l, pStr2.l, StrLen2.l)
;********************************
;*** Compute Levenshtein Distance
;********************************
S1_i.s = "" ; ith character of s
S2_j.s = "" ; jth character of t
cost.l=0 ; cost
; Step 1
Dim d.l(StrLen1,StrLen2); matrix
If StrLen1 = 0
ProcedureReturn StrLen2
EndIf
If StrLen2 = 0
ProcedureReturn StrLen1
EndIf
; Step 2
For i = 0 To StrLen1
d(i, 0) = i
Next
For j = 0 To StrLen2
d(0, j) = j
Next
;Step 3
For i = 1 To StrLen1
S1_i = MidMem(pStr1, i, 1)
; Step 4
For j = 1 To StrLen2
S2_j = MidMem(pStr2, j, 1)
; Step 5
If S1_i = S2_j
cost = 0
Else
cost = 1
EndIf
; Step 6
d(i, j) = Minimum(d(i - 1, j) + 1, d(i, j - 1) + 1, d(i - 1, j - 1) + cost)
Next
Next
; Step 7
ProcedureReturn d(StrLen1, StrLen2)
EndProcedure
DisableDebugger
Str1.s = "Heello"
Str2.s = "hellddo"
Ans1 = Levenshtein(@Str1,Len(Str1),@Str2,Len(Str2))
File1.s = OpenFileRequester("File 1","","",0)
File2.s = OpenFileRequester("File 2","","",0)
If Len(File1) > 1 And Len(file2) > 1
OpenFile(1,File1)
FileLen1 = Lof(1)
pFile1Data = AllocateMemory(FileLen1)
ReadData(1, pFile1Data, FileLen1)
CloseFile(1)
OpenFile(2,File2)
FileLen2 = Lof(2)
pFile2Data = AllocateMemory(FileLen2)
ReadData(2, pFile1Data, FileLen2)
CloseFile(2)
Ans2 = Levenshtein(pFile1Data,FileLen1 ,pFile2Data,FileLen2)
EndIf
EnableDebugger
Debug ans1
Debug ans2
Paul Dwyer
“In nature, it’s not the strongest nor the most intelligent who survives. It’s the most adaptable to change” - Charles Darwin
“If you can't explain it to a six-year old you really don't understand it yourself.” - Albert Einstein
“In nature, it’s not the strongest nor the most intelligent who survives. It’s the most adaptable to change” - Charles Darwin
“If you can't explain it to a six-year old you really don't understand it yourself.” - Albert Einstein
Doubledutch, you are missing
Dim DistanceMatrix(m,n)
and the #MaxWordLength
I added them and they worked, did you use a global array? Any reason?
I think these are the same, I like yours without the function call, I think that's unnecessary overheard considering the number of times it can get called (millions for a file)
Dim DistanceMatrix(m,n)
and the #MaxWordLength
I added them and they worked, did you use a global array? Any reason?
I think these are the same, I like yours without the function call, I think that's unnecessary overheard considering the number of times it can get called (millions for a file)
Paul Dwyer
“In nature, it’s not the strongest nor the most intelligent who survives. It’s the most adaptable to change” - Charles Darwin
“If you can't explain it to a six-year old you really don't understand it yourself.” - Albert Einstein
“In nature, it’s not the strongest nor the most intelligent who survives. It’s the most adaptable to change” - Charles Darwin
“If you can't explain it to a six-year old you really don't understand it yourself.” - Albert Einstein
- DoubleDutch
- Addict
- Posts: 3220
- Joined: Thu Aug 07, 2003 7:01 pm
- Location: United Kingdom
- Contact:
Please see...Doubledutch, you are missing
Dim DistanceMatrix(m,n)
and the #MaxWordLength
doubledutch wrote:Also note: DistanceMatrix[MaxWordLength,MaxWordLength] should be set up as a global array where MaxWordLength is the max word length!

Edit: Your right, both routines are near identical. I just wanted to give an example of where the routine would have some use (like in a spell checker) and to show how you didn't need the expensive procdure call from the original algorithm.
https://deluxepixel.com <- My Business website
https://reportcomplete.com <- School end of term reports system
https://reportcomplete.com <- School end of term reports system

Oops, did it again.
(kind of off topic)
DistanceMatrix has no need of being global, why would you make it global? Or is it just because you were doing a test? Is there some perf benefit in PB I'm not aware of?
I put it in as:
Code: Select all
#MaxWordLength = 100
Procedure Distance(s$,t$)
m=Len(s$)
n=Len(t$)
Dim DistanceMatrix(m,n)
If m And n
If m>#MaxWordLength
m=#maxWordLength
EndIf
If n>#MaxWordLength
n=#maxWordLength
EndIf
For i=0 To m
DistanceMatrix(i,0)=i
Next
For j=0 To n
DistanceMatrix(0,j)=j
Next
For i=1 To m
For j=1 To n
If Mid(s$,i,1)=Mid(t$,j,1)
cost=#False
Else
cost=#True
EndIf
a=DistanceMatrix(i-1,j)+1 ; deletion
b=DistanceMatrix(i,j-1)+1 ; insertion
c=DistanceMatrix(i-1,j-1)+cost ; substitution
If b<a
a=b
EndIf
If c<a
a=c
EndIf
DistanceMatrix(i,j)=a
Next
Next
result=DistanceMatrix(m,n)
Else
result=9999
EndIf
ProcedureReturn result
EndProcedure
Debug distance("Paul"," Paul")
Paul Dwyer
“In nature, it’s not the strongest nor the most intelligent who survives. It’s the most adaptable to change” - Charles Darwin
“If you can't explain it to a six-year old you really don't understand it yourself.” - Albert Einstein
“In nature, it’s not the strongest nor the most intelligent who survives. It’s the most adaptable to change” - Charles Darwin
“If you can't explain it to a six-year old you really don't understand it yourself.” - Albert Einstein
- DoubleDutch
- Addict
- Posts: 3220
- Joined: Thu Aug 07, 2003 7:01 pm
- Location: United Kingdom
- Contact:
It's because the memory allocation is small and it only needs to be allocated once. I don't know internally how PureBasic does it, but I presume that it uses a Windows routine - so possibly (if other programs are also allocating memory) if I did loads of small memory allocations then the allocations *may* be interleaved (with those of other programs) and this *may* result in memory fragmentation and a general slow down of the system.
This is just a guess though, so I may be wrong?
This is just a guess though, so I may be wrong?
https://deluxepixel.com <- My Business website
https://reportcomplete.com <- School end of term reports system
https://reportcomplete.com <- School end of term reports system
-
- User
- Posts: 36
- Joined: Fri Feb 23, 2007 8:00 pm
- Location: New York
- DoubleDutch
- Addict
- Posts: 3220
- Joined: Thu Aug 07, 2003 7:01 pm
- Location: United Kingdom
- Contact:
Maybe you should post it as a new tip and a new feature suggestion?
https://deluxepixel.com <- My Business website
https://reportcomplete.com <- School end of term reports system
https://reportcomplete.com <- School end of term reports system