type phonetic standardisation. Jassing recently posted one version at
. Expanding his example to also remove parenthesis and apostrophes…
Code: Select all
; NYSIIS.PBI
; in the 1990's there was a huge study performed by New York state
; to come up with a better name matching technique.
; the result of that was a document "Name Search Technique" by Robert L Taft.
; While I was employed at Social Services in California, I used this algorithm
; for matching names in databases (ie: Someone says "Smith" but was it "Smythe"?
; We also temporarily normalised names "Chuck" to "Charles" despite the fact someone
; may actually be named "Chuck" someone may think it's short for "Charles" - this
; algorithm does not do that, but it is something to consider when trying to match names.
; but it should match misspellings such as "tomas" "thomas"
; The alogithm is "New York State Identification & Intellegence System", or NYSIIS for short.
; And is covered in pages #88-90
; we had a lot of Portugese clients so we had some additoinal tweaks, but tweaks are beyond the original scope.
; I have included only 2 tweaks to demonstrate the ease of changing it, experiment ....
;
; The document is 126 pages long, and includes indepth informatin, as well as detailed statistical analysis.
; if you want a copy (of my copy), get in touch with me on gmail (josh.assing) and I'll make copies for costs+postage.
; OR better (Thanks PBJim) https://en.wikipedia.org/wiki/New_York_State_Identification_and_Intelligence_System
;
; My original C code was ascii only,not sure how relevant that is. (NYSIIS didn't take into account non ascii
; this doesn't either ...)
EnableExplicit
; not specified... this was'the old days'; "ú" was entered as "u"; for NYSIIS() we need to convert these.
; YOU SHOULD convert accented to non (ie:"ú" TO "u")
; XIncludeFile "changeAccentedMOD.pbi"
Macro replaceStr(str,find,new,position):ReplaceString(str,find,new,#PB_String_CaseSensitive|#PB_String_InPlace,position,1):EndMacro
Macro removeChar(str,find) : ReplaceString(str,find,"",#PB_String_NoCase) : EndMacro
#includeJoshExtension=#False
Procedure.s NYSIIS( original.s, bForIndexing=#False )
Protected nysiis.s, p,l, nIndexLength
nIndexLength = Len(original)
; this is not specifically specified, but implied.
original = UCase(Trim(original)) ; only deal with upper case.
CompilerIf #includeJoshExtension
; not specified, optional...
CompilerIf Defined(changeAccented,#PB_Module)
changeAccented::changeAll(@original)
CompilerEndIf
; this is mine
; ReplaceString(original,"Y","I",#PB_String_NoCase|#PB_String_InPlace,2)
CompilerEndIf
; this is not specificaly specified, but it makes sense
original = removeChar(original," ")
original = removeChar(original,".")
original = removeChar(original,",")
original = removeChar(original,"'")
original = removeChar(original,"(")
original = removeChar(original,")")
; Step 1.
If Left(original,3)="MAC" : replaceStr(original,"MAC","MCC",1)
ElseIf Left(original,3)="SCH" : replaceStr(original,"SCH","SSS",1)
ElseIf Left(original,2)="KN" : replaceStr(original,"KN", "NN", 1)
ElseIf Left(original,2)="PH" : replaceStr(original,"PH", "FF", 1)
ElseIf Left(original,2)="PF" : replaceStr(original,"PF", "FF", 1)
ElseIf Left(original,1)="K" : replaceStr(original,"K" ,"C", 1)
EndIf
; Step 2.
original=ReverseString(original) ; not technically part of it; but makes the next bit easier.
Select Left(original,2)
Case "EE","EI"
original = " Y"+Mid(original,3)
Case "TD","TR","DR","TN","DN"
original = " D"+Mid(original,3)
EndSelect
CompilerIf #includeJoshExtension
; mine,again
If Left(original,1)="E"
; original=Mid(original,2)
EndIf
CompilerEndIf
original=ReverseString(original)
; step 3
nysiis = Left(original,1)
; step 4
p=2 : l = Len(original)
; step 5 (parts are not officially labeled as such, but are broken down to paragraphs)
; (this loop can (and should) be optimised, I just wanted it to closely match the description,
; but this should help you undestand what's going on & follow the document)
While p <= l
Select Mid(original,p,1)
Case "A","E","I","O","U" ; part A
If Mid(original,p,2)="EV"
replaceStr(original,"EV","AF",p)
Else
replaceStr(original,Mid(original,p,1),"A",p)
EndIf
Case "Q","Z","M" ; part B
replaceStr(original,"Q","G",p)
replaceStr(original,"Z","S",p)
replaceStr(original,"Z","N",p)
Case "K" ; part C
If Mid(original,p+1,1)="N"
replaceStr(original,"K","N",p)
Else
replaceStr(original,"K","C",p)
EndIf
Case "H" ; part E (we'll get to part D)
If Not FindString("AEIOU",Mid(original,p-1),1,#PB_String_CaseSensitive) Or
Not FindString("AEIOU",Mid(original,p+1),1,#PB_String_CaseSensitive)
replaceStr(original,"H",Mid(original,p-1,1),p)
EndIf
Case "W" ; part F
If FindString("AEIOU",Mid(original,p-1),1,#PB_String_CaseSensitive)
replaceStr(original,"H",Mid(original,p-1,1),p)
EndIf
CompilerIf #includeJoshExtension
Case "Y" ; mine
replaceStr(original,"Y","I",p)
CompilerEndIf
Default ; part D
If Mid(original,p,3)="SCH" : replaceStr(original,"SCH","SSS",p)
ElseIf Mid(original,p,2)="PH" : replaceStr(original,"PH","FF",p)
EndIf
; part G
; no match, do nothing
EndSelect
; step 6
If Mid(original,p,1) <> Right(nysiis,1)
nysiis+Mid(original,p,1)
EndIf
p+1
Wend
; step 7
If Right(nysiis,1)="S" And ( #includeJoshExtension =#False Or Len(NYSIIS)>4) ; after the 'And' is mine.
nysiis=Left(nysiis,Len(nysiis)-1)
EndIf
; step 8
If Right(nysiis,2)="AY"
nysiis=Left(nysiis,Len(nysiis)-2)+"Y"
ElseIf Right(nysiis,1)="A" ; step 9
nysiis=Left(nysiis,Len(nysiis)-1)
EndIf
; done!!
If bForIndexing ; indexes should keep original field length.
nysiis=LSet(nysiis,nIndexLength," ")
EndIf
ProcedureReturn nysiis
EndProcedure
Debug nysiis("Sittin' on the dock of the bay")
Debug NYSIIS("Sittin' on the Dock of the bay")
Debug nysiis("Sitting on the dock of the bay")
Debug NYSIIS("(Sittin' on) The dock of the bay")
Debug NYSIIS(" The dock of the bay")
As Azjio suggests you can then use something like the Levenshtein distance to determine proximity to a specific target.