Removed

Share your advanced PureBasic knowledge/code with the community.
Dr_Wildrick
User
User
Posts: 36
Joined: Fri Feb 23, 2007 8:00 pm
Location: New York

Removed

Post by Dr_Wildrick »

Removed
Last edited by Dr_Wildrick on Sun Feb 24, 2008 6:02 am, edited 2 times in total.
User avatar
GeoTrail
Addict
Addict
Posts: 2794
Joined: Fri Feb 13, 2004 12:45 am
Location: Bergen, Norway
Contact:

Post by GeoTrail »

Very cool. works like a charm.

Robert and robert gives 1, Robert and roberT gives 2 :)

Here's the code in formatted fashion for those who find it hard to read the unformatted kind ;)

Code: Select all

Procedure.l Minimum(a.l,b.l,c.l) 
  ;******************************* 
  ;*** Get minimum of three values 
  ;******************************* 
  mi.l=0 
  mi = a 
  If b < mi 
  mi = b 
  EndIf 
  If c < mi 
  mi = c 
  EndIf 
  ProcedureReturn mi 
EndProcedure 

Procedure.l Levenshtein(s$,t$) 
  ;******************************** 
  ;*** Compute Levenshtein Distance 
  ;******************************** 
  m.l = 0; length of t 
  n.l = 0;length of s 
  i.l=0 ; iterates through s 
  j.l=0 ; iterates through t 
  s_i.s ="" ; ith character of s 
  t_j.s = ""; jth character of t 
  cost.l=0 ; cost 
  
  ; Step 1 
  n = Len(s$) 
  m = Len(t$) 
  Dim d(n,m); matrix 
  
  If n = 0 
    LD = m 
    ProcedureReturn ld 
  EndIf 
  
  If m = 0 
    LD = n 
    ProcedureReturn ld 
  EndIf 
  
  ; Step 2 
  For i = 0 To n 
    d(i, 0) = i 
  Next 
  
  For j = 0 To m 
    d(0, j) = j 
  Next 
  
  ;Step 3 
  For i = 1 To n 
    s_i = Mid(s$, i, 1) 
    
    ; Step 4 
    For j = 1 To m 
      t_j = Mid(t$, j, 1) 
      
      ; Step 5 
      If s_i = t_j 
        cost = 0 
      Else 
        cost = 1 
      EndIf 
      
      ; Step 6 
      d(i, j) = Minimum(d(i - 1, j) + 1, d(i, j - 1) + 1, d(i - 1, j - 1) + cost) 
    Next 
  
  Next 
  
  ; Step 7 
  LD = d(n, m) 
  ProcedureReturn ld 
EndProcedure

Debug Levenshtein("Robert","roberT")
I Stepped On A Cornflake!!! Now I'm A Cereal Killer!
Dr_Wildrick
User
User
Posts: 36
Joined: Fri Feb 23, 2007 8:00 pm
Location: New York

Removed

Post by Dr_Wildrick »

Removed
Last edited by Dr_Wildrick on Sun Feb 24, 2008 6:03 am, edited 1 time in total.
Dare
Addict
Addict
Posts: 1965
Joined: Mon May 29, 2006 1:01 am
Location: Outback

Post by Dare »

Nice.

What would be good a way to assess similarities? Average size / cost?
And when would similarities raise a flag for, say, plagerism?

(I guess this is a "how long is a piece of string" question).
Dare2 cut down to size
Dr_Wildrick
User
User
Posts: 36
Joined: Fri Feb 23, 2007 8:00 pm
Location: New York

Removed

Post by Dr_Wildrick »

Removed
Last edited by Dr_Wildrick on Sun Feb 24, 2008 6:03 am, edited 1 time in total.
Dare
Addict
Addict
Posts: 1965
Joined: Mon May 29, 2006 1:01 am
Location: Outback

Re: How long is a string

Post by Dare »

Dr_Wildrick wrote:I have an ex-wife and two teenage daughters who makes sure I am aware of that fact on an a regular basis.)
:D

Thanks for info. :)
Dare2 cut down to size
dell_jockey
Enthusiast
Enthusiast
Posts: 767
Joined: Sat Jan 24, 2004 6:56 pm

Post by dell_jockey »

Aahhhh, this takes me back to 1991, when I published an article in the C/C++ Users Journal (May issue, Vol.9/Nr.5) called "Inexact Alphanumeric Comparisons". The code I presented in that article used the Levenstein distance as well.

Perhaps it's time to dust off a more recent iteration of that old C-code and port it to PureBasic...
cheers,
dell_jockey
________
http://blog.forex-trading-ideas.com
User avatar
DoubleDutch
Addict
Addict
Posts: 3220
Joined: Thu Aug 07, 2003 7:01 pm
Location: United Kingdom
Contact:

Post by DoubleDutch »

This is the distance routine I wrote for Elementary Reports's spell checker...

Code: Select all

Procedure Distance(s$,t$)
	m=Len(s$)
	n=Len(t$)
	If m And n
		If m>#MaxWordLength
			m=#maxWordLength
		EndIf
		If n>#MaxWordLength
			n=#maxWordLength
		EndIf
		For i=0 To m
			DistanceMatrix(i,0)=i
		Next
		For j=0 To n
			DistanceMatrix(0,j)=j
		Next
		For i=1 To m
			For j=1 To n
				If Mid(s$,i,1)=Mid(t$,j,1)
					cost=#False
				Else
					cost=#True
				EndIf
				a=DistanceMatrix(i-1,j)+1	; deletion
				b=DistanceMatrix(i,j-1)+1	; insertion
				c=DistanceMatrix(i-1,j-1)+cost ; substitution
				If b<a
					a=b
				EndIf
				If c<a
					a=c
				EndIf
				DistanceMatrix(i,j)=a
			Next
		Next
		result=DistanceMatrix(m,n)
	Else
		result=9999
	EndIf
	ProcedureReturn result
EndProcedure
I got the algorithm for the routine from Wikipedia:
http://en.wikipedia.org/wiki/Levenshtein_distance

Please note: The 9999 value is there because it works well with the spell checker, it should really be the greater length of the two strings (which is exactly how many chages it would take for a null string compared with a non-null string.

Also note: DistanceMatrix[MaxWordLength,MaxWordLength] should be set up as a global array where MaxWordLength is the max word length! ;)
https://deluxepixel.com <- My Business website
https://reportcomplete.com <- School end of term reports system
User avatar
pdwyer
Addict
Addict
Posts: 2813
Joined: Tue May 08, 2007 1:27 pm
Location: Chiba, Japan

Post by pdwyer »

Playing with the code here I was expected to be disapointed for some reason but I was pleasently surprised :)

I modified the original code to be faster using memory buffers, and it lets it handle pointers to file bytes. (and made the vars easier to recognise ;) )

Even so, doing a lot of comparisons could be time consuming. If you have a lot of students with a large report it's going to take a while to number crunch. Anyway, here's what I have, thanks for introducing me to the algorithm (I'll look at doubledutch's one next :)

Code: Select all


Procedure.s MidMem(*MainMem, StartPos.l, GetLen.l)  

    If *MainMem = 0 Or GetLen = 0
        ProcedureReturn ""
    EndIf
    
    *RetMem = AllocateMemory(GetLen)
    CopyMemory(*MainMem + StartPos -1, *RetMem, GetLen)
    ReturnString.s = PeekS(*RetMem,GetLen)
    FreeMemory(*RetMem)
    
    ProcedureReturn ReturnString

EndProcedure


Procedure.l Minimum(a.l,b.l,c.l) 
    ;******************************* 
    ;*** Get minimum of three values 
    ;******************************* 
    mi.l=0 
    mi = a 
    
    If b < mi 
        mi = b 
    EndIf 
    
    If c < mi 
        mi = c 
    EndIf 
    
    ProcedureReturn mi 
EndProcedure 

Procedure.l Levenshtein(pStr1.l, StrLen1.l, pStr2.l, StrLen2.l) 
    ;******************************** 
    ;*** Compute Levenshtein Distance 
    ;******************************** 
 
    S1_i.s = "" ; ith character of s 
    S2_j.s = "" ; jth character of t 
    cost.l=0 ; cost 
    
    ; Step 1 
    Dim d.l(StrLen1,StrLen2); matrix 
    
    If StrLen1 = 0 
        ProcedureReturn StrLen2
    EndIf 
    
    If StrLen2 = 0 
        ProcedureReturn StrLen1 
    EndIf 
    
    ; Step 2 
    For i = 0 To StrLen1 
        d(i, 0) = i 
    Next 
    
    For j = 0 To StrLen2 
        d(0, j) = j 
    Next 
  
  ;Step 3 
    For i = 1 To StrLen1 
        S1_i = MidMem(pStr1, i, 1) 
        
        ; Step 4 
        For j = 1 To StrLen2 
            S2_j = MidMem(pStr2, j, 1) 
             
            ; Step 5 
            If S1_i = S2_j 
                cost = 0 
            Else 
                cost = 1 
            EndIf 
            
            ; Step 6 
            d(i, j) = Minimum(d(i - 1, j) + 1, d(i, j - 1) + 1, d(i - 1, j - 1) + cost) 
        Next 
    
    Next 
  
    ; Step 7  
    ProcedureReturn d(StrLen1, StrLen2)

EndProcedure 

DisableDebugger


Str1.s = "Heello"
Str2.s = "hellddo"

Ans1 =  Levenshtein(@Str1,Len(Str1),@Str2,Len(Str2))

File1.s = OpenFileRequester("File 1","","",0)
File2.s = OpenFileRequester("File 2","","",0)

If Len(File1) > 1 And Len(file2) > 1 

    OpenFile(1,File1)
        FileLen1 = Lof(1)
        pFile1Data = AllocateMemory(FileLen1)
        ReadData(1, pFile1Data, FileLen1)
    CloseFile(1)
    
    OpenFile(2,File2)
        FileLen2 = Lof(2)
        pFile2Data = AllocateMemory(FileLen2)
        ReadData(2, pFile1Data, FileLen2)
    CloseFile(2)

    Ans2 =  Levenshtein(pFile1Data,FileLen1 ,pFile2Data,FileLen2)

EndIf
 
EnableDebugger

Debug ans1
Debug ans2
 

Paul Dwyer

“In nature, it’s not the strongest nor the most intelligent who survives. It’s the most adaptable to change” - Charles Darwin
“If you can't explain it to a six-year old you really don't understand it yourself.” - Albert Einstein
User avatar
pdwyer
Addict
Addict
Posts: 2813
Joined: Tue May 08, 2007 1:27 pm
Location: Chiba, Japan

Post by pdwyer »

Doubledutch, you are missing

Dim DistanceMatrix(m,n)
and the #MaxWordLength

I added them and they worked, did you use a global array? Any reason?

I think these are the same, I like yours without the function call, I think that's unnecessary overheard considering the number of times it can get called (millions for a file)
Paul Dwyer

“In nature, it’s not the strongest nor the most intelligent who survives. It’s the most adaptable to change” - Charles Darwin
“If you can't explain it to a six-year old you really don't understand it yourself.” - Albert Einstein
User avatar
DoubleDutch
Addict
Addict
Posts: 3220
Joined: Thu Aug 07, 2003 7:01 pm
Location: United Kingdom
Contact:

Post by DoubleDutch »

Doubledutch, you are missing

Dim DistanceMatrix(m,n)
and the #MaxWordLength
Please see...
doubledutch wrote:Also note: DistanceMatrix[MaxWordLength,MaxWordLength] should be set up as a global array where MaxWordLength is the max word length!
:)

Edit: Your right, both routines are near identical. I just wanted to give an example of where the routine would have some use (like in a spell checker) and to show how you didn't need the expensive procdure call from the original algorithm.
https://deluxepixel.com <- My Business website
https://reportcomplete.com <- School end of term reports system
User avatar
pdwyer
Addict
Addict
Posts: 2813
Joined: Tue May 08, 2007 1:27 pm
Location: Chiba, Japan

Post by pdwyer »

:lol:

Oops, did it again.

(kind of off topic)
DistanceMatrix has no need of being global, why would you make it global? Or is it just because you were doing a test? Is there some perf benefit in PB I'm not aware of?

I put it in as:

Code: Select all

#MaxWordLength = 100
 
Procedure Distance(s$,t$) 
   m=Len(s$) 
   n=Len(t$) 
   
   Dim DistanceMatrix(m,n)
   
   If m And n 
      If m>#MaxWordLength 
         m=#maxWordLength 
      EndIf 
      If n>#MaxWordLength 
         n=#maxWordLength 
      EndIf 
      For i=0 To m 
         DistanceMatrix(i,0)=i 
      Next 
      For j=0 To n 
         DistanceMatrix(0,j)=j 
      Next 
      For i=1 To m 
         For j=1 To n 
            If Mid(s$,i,1)=Mid(t$,j,1) 
               cost=#False 
            Else 
               cost=#True 
            EndIf 
            a=DistanceMatrix(i-1,j)+1   ; deletion 
            b=DistanceMatrix(i,j-1)+1   ; insertion 
            c=DistanceMatrix(i-1,j-1)+cost ; substitution 
            If b<a 
               a=b 
            EndIf 
            If c<a 
               a=c 
            EndIf 
            DistanceMatrix(i,j)=a 
         Next 
      Next 
      result=DistanceMatrix(m,n) 
   Else 
      result=9999 
   EndIf 
   ProcedureReturn result 
EndProcedure

Debug distance("Paul"," Paul") 
I think I might make a pointer version of yours to handle files as nulls, I thing the extra function call is unnecessary, but get rid of the "MaxWordLength" limitation
Paul Dwyer

“In nature, it’s not the strongest nor the most intelligent who survives. It’s the most adaptable to change” - Charles Darwin
“If you can't explain it to a six-year old you really don't understand it yourself.” - Albert Einstein
User avatar
DoubleDutch
Addict
Addict
Posts: 3220
Joined: Thu Aug 07, 2003 7:01 pm
Location: United Kingdom
Contact:

Post by DoubleDutch »

It's because the memory allocation is small and it only needs to be allocated once. I don't know internally how PureBasic does it, but I presume that it uses a Windows routine - so possibly (if other programs are also allocating memory) if I did loads of small memory allocations then the allocations *may* be interleaved (with those of other programs) and this *may* result in memory fragmentation and a general slow down of the system.

This is just a guess though, so I may be wrong?
https://deluxepixel.com <- My Business website
https://reportcomplete.com <- School end of term reports system
Dr_Wildrick
User
User
Posts: 36
Joined: Fri Feb 23, 2007 8:00 pm
Location: New York

Removed

Post by Dr_Wildrick »

Removed
Last edited by Dr_Wildrick on Sun Feb 24, 2008 6:04 am, edited 1 time in total.
User avatar
DoubleDutch
Addict
Addict
Posts: 3220
Joined: Thu Aug 07, 2003 7:01 pm
Location: United Kingdom
Contact:

Post by DoubleDutch »

Maybe you should post it as a new tip and a new feature suggestion?
https://deluxepixel.com <- My Business website
https://reportcomplete.com <- School end of term reports system
Post Reply