Many thanks to Amílcar Matos Pérez, Vera and Dude, for taking interest in the algorithm.
I have worked on the algorithm to find the best solution and my program is given below.
My request to all is to see if the program can be written better and / or if there are any logical errors.
The purpose of the program is to identify an original text and compare the text to a "Proof Read" text along with all possible corrections in it. These I have named as "Original" and "Corrected" text in the program.
My requirement is to understand what is the typing error percentage on the original texts hence I need to know
a.) Which words were removed from the original text and so constitute an error,
c.) Ensure that if the words typed in the original exist in the corrected text but may not be in the same position then these words are to be taken OK and
d.) In case a word has been repeated in the original text without any counterpart in the Corrected text, then it shall be an error.
After getting the word count for errors, to calculate the error percentage of the original typed text.
Code: Select all
;{- Program header
;==Code Header Comment==============================
; Name/title: ProofTest.pb
; Executable name: ProofTest.exe
; Version: 2.00
; Author: Blurryan initial version.
; Collaborator: Amílcar Matos Pérez (San Juan, Puerto Rico)
; Release date/hour: 03/Sep/2015
; Operating system: Windows 10
; Compiler version: PureBasic 5.31 (x64)
; Explanation: To compare two text lines and identify changes present to calculate errors and percentages.
; ==================================================
;.......10........20........30........40........50........60........70........80........90.......100.......110.......120.......130.......140
;}
Structure Rev ; Structure made for trapping the words array along with their position in the text
item.s ; Word in a string
orgno.i ; place in the string
EndStructure
; variables initialised
Global.s ORIG, CORR
Global.i perf, imperf, rept, rem, addi, err
Global.i i, ii, j, jj, m = 0, Pj, Flag = 0
Global.i Quit = 0, LenORIG, LenCORR, LenORIGStat, LenCORRStat, Correxistence, CountO = 0, CountC = 0
Global.f Errorpercent
; arrays - O for Original text, C for Corrected text, P and Q are to trap whether the matches are Perfect/ Imperfect / Repeat / Not existent
Global Dim O.s(100) : Global Dim C.s(100) : Global Dim P.s(100) : Global Dim Q.s(100)
Global Dim OX.Rev(100) : Global Dim OY.Rev(100) ; Structured array for working
Font1 = LoadFont(0, "Consolas", 8) ; main font used
; Declarations of the procedures
Declare ConverttoOXOY() ; Converts the texts given to the stucture formats
Declare InitiateOutput() ; Initial printing of headers
Declare ProcessPerfectMatches() ; processes for perfect matches i.e. matching of text AND position in text
Declare ProcessImperfectMatches() ; processes for imperfect matches i.e. matching of text but not of exact position in text
Declare ProcessRepeats() ; processes for repeats i.e. whether a word has been repeated in the texts - not significant as this is an error
Declare ProcessNotInCorr() ; processes for words that are in Original text but NOT in the Corrected text
Declare ProcessNotInOrig() ; processes for words that are in Corrected text but NOT in the Original text
Declare EndProcess() ; end process and final printing of the summary
Declare Initialize() ; initialises the variables
; Main window is opened along with respective gadgets
If OpenWindow(0, 300, 10, 1000, 620, "Words test", #PB_Window_SystemMenu)
TextGadget(1, 10, 10, 90, 20, "Original text:") : SetGadgetFont(1, Font1)
StringGadget(2, 110, 10, 880, 20, "The quick fox jumps over the fox jumps over the fox jumps over the lazzy dog.") : SetGadgetFont(2, Font1)
TextGadget(3, 10, 40, 90, 20, "Corrected text:") : SetGadgetFont(3, Font1)
StringGadget(4, 110, 40, 880, 20, "The quick brown fox jumps over the lazy dog.") : SetGadgetFont(4, Font1)
EditorGadget(5, 10, 70, 980, 500) : SetGadgetFont(5, Font1)
ButtonGadget(8, 20, 590, 100, 20, "Clear") : SetGadgetFont(8, Font1) ; This clears all fields in the window and then one can input afresh
ButtonGadget(9, 150, 590, 100, 20, "Initiate") : SetGadgetFont(9, Font1) ; clears the editor but keeps the original and corrected text which can be mdified for a subsequent run
ButtonGadget(10, 740, 590, 100, 20, "GO") : SetGadgetFont(10, Font1) ; to process the texts and send output to editor
ButtonGadget(11, 880, 590, 100, 20, "Quit") : SetGadgetFont(11, Font1) ; quits program
Repeat
Select WaitWindowEvent()
Case #PB_Event_CloseWindow
Quit = 1
Case #PB_Event_Gadget
Select EventGadget()
Case 8
SetGadgetText(2, "")
SetGadgetText(4, "")
ClearGadgetItems(5)
Initialize()
Case 9
Initialize()
Case 10
ORIG = GetGadgetText(2) : LenORIG = Len(ORIG) : CORR = GetGadgetText(4) : LenCORR = Len(CORR)
ConverttoOXOY()
InitiateOutput()
ProcessPerfectMatches()
ProcessImperfectMatches()
ProcessRepeats()
ProcessNotInCorr()
ProcessNotInOrig()
EndProcess()
Case 11
Quit = 1
EndSelect
EndSelect
Until Quit = 1
EndIf
Procedure ConverttoOXOY() ; Converts the texts given to the stucture formats
i = 0
j = 0
m = 0
LenORIGStat = 0: LenCORRStat = 0
; processes length of string in characters and based on "word = characters within spaces" processes the words and word counts
While LenORIGStat <= LenORIG
i = i + 1
ii = i
O(i) = StringField(ORIG, i, " ")
LenORIGStat = LenORIGStat + Len(StringField(ORIG, i, " ")) + 1
Wend
While LenCORRStat <= LenCORR
j = j + 1
jj = j
C(j) = StringField(CORR, j, " ")
LenCORRStat = LenCORRStat + Len(StringField(CORR, j, " ")) + 1
Wend
; Redim the arrays for efficient use of storage and memory and create the structured files for original and corrected texts
ReDim O(ii) : ReDim C(jj) : ReDim OX.Rev(ii): ReDim OY.Rev(jj): ReDim P(jj): ReDim Q(ii)
For i = 1 To ii
OX(i)\item = O(i)
OX(i)\orgno = i
Next i
For j = 1 To jj
OY(j)\item = C(j)
OY(j)\orgno = j
Next j
EndProcedure
Procedure InitiateOutput() ; Initial printing of headers
AddGadgetItem(5, -1, "Original text: " + PeekS(@ORIG) + " Words: " + Str(ii))
AddGadgetItem(5, -1, "Corrected text: " + PeekS(@CORR) + " Words: " + Str(jj))
AddGadgetItem(5, -1, "")
AddGadgetItem(5, -1, "Original" + Chr(9) + "Original" + Chr(9) + "Corrected" + Chr(9) + "Corrected" + Chr(9) + "COMMENTS")
AddGadgetItem(5, -1, " Word " + Chr(9) + " Srl # " + Chr(9) + " Word " + Chr(9) + " Srl # ")
AddGadgetItem(5, -1, "--------" + Chr(9) + "--------" + Chr(9) + "---------" + Chr(9) + "---------")
AddGadgetItem(5, -1, "")
EndProcedure
Procedure ProcessPerfectMatches() ; processes for perfect matches i.e. matching of text AND position in text
For i = 1 To ii
For j = 1 To jj
If OX(i)\item = OY(j)\item And OX(i)\orgno = OY(j)\orgno
AddGadgetItem(5, -1, OX(i)\item + Chr(9) + Chr(9) + Chr(9) + Str(OX(i)\orgno) + Chr(9) + OY(j)\item + Chr(9) + Chr(9) + Chr(9) + Str(OY(j)\orgno) + Chr(9) + "PERFECT MATCH !!")
perf = perf + 1
P(j) = "P": Q(i) = "P" ; puts "P" in counter of P and Q array to identify that word is already considered
CountO = CountO + 1: CountC = CountC + 1 ; a counter count of words to ensure all words taken and to tally with ii and jj
EndIf
Next j
Next i
AddGadgetItem(5, -1, "---------------------------------------------------------")
AddGadgetItem(5, -1, "")
EndProcedure
Procedure ProcessImperfectMatches() ; processes for imperfect matches i.e. matching of text but not of exact position in text
Pj = 0
For i = 1 To ii
For j = 1 To jj
If OX(i)\item = OY(j)\item And OX(i)\orgno <> OY(j)\orgno
If P(j) = "P" Or P(j) = "I" Or Q(i) = "P" Or Q(i) = "I"
; do not print or take as IMPERFECT WORD
Else
; print and take as IMPERFECT WORD
AddGadgetItem(5, -1, OX(i)\item + Chr(9) + Chr(9) + Chr(9) + Str(OX(i)\orgno) + Chr(9) + OY(j)\item + Chr(9) + Chr(9) + Chr(9) + Str(OY(j)\orgno) + Chr(9) + "IMPERFECT MATCH !")
imperf = imperf + 1
If P(j) = "": P(j) = "I": EndIf: If Q(i) = "": Q(i) = "I": EndIf ; puts "I" in counter of P and Q array to identify that word is already considered
CountO = CountO + 1: CountC = CountC + 1 ; a counter count of words to ensure all words taken and to tally with ii and jj
EndIf
EndIf
Next j
Next i
AddGadgetItem(5, -1, "---------------------------------------------------------")
AddGadgetItem(5, -1, "")
EndProcedure
Procedure ProcessRepeats() ; processes for repeats i.e. whether a word has been repeated in the texts - not significant as this is an error
Pj = 0
For i = 1 To ii
For j = 1 To jj
If OX(i)\item = OY(j)\item And OX(i)\orgno <> OY(j)\orgno
If P(j) = "" And Q(i) = ""
; print and take as REPEAT WORD
AddGadgetItem(5, -1, OX(i)\item + Chr(9) + Chr(9) + Chr(9) + Str(OX(i)\orgno) + Chr(9) + OY(j)\item + Chr(9) + Chr(9) + Chr(9) + Str(OY(j)\orgno) + Chr(9) + "REPEAT WORD")
rept = rept + 1
err = err + 1
P(j) = "R": Q(i) = "R"
Else
; do not print and take as REPEAT WORD
EndIf
EndIf
Next j
Next i
AddGadgetItem(5, -1, "---------------------------------------------------------")
AddGadgetItem(5, -1, "")
EndProcedure
Procedure ProcessNotInCorr() ; processes for words that are in Original text but NOT in the Corrected text
; This routine brings out the words that are in ORIGINAL Text but NOT in CORRECTED Text
For i = 1 To ii
If Q(i) = ""
AddGadgetItem(5, -1, OX(i)\item + Chr(9) + Chr(9) + Chr(9) + Str(OX(i)\orgno) + Chr(9) + Chr(9) + Chr(9) + Chr(9) + Chr(9) + "Not in Corrected Text")
rem = rem + 1
err = err + 1
CountO = CountO + 1
EndIf
Next i
AddGadgetItem(5, -1, "---------------------------------------------------------")
AddGadgetItem(5, -1, "")
EndProcedure
Procedure ProcessNotInOrig() ; processes for words that are in Corrected text but NOT in the Original text
; This routine brings out the words that are in CORRECTED Text but NOT in ORIGINAL Text
For j = 1 To jj
If P(j) = ""
AddGadgetItem(5, -1, Chr(9) + Chr(9) + Chr(9) + Chr(9) + OY(j)\item + Chr(9) + Chr(9) + Chr(9) + Str(OY(j)\orgno) + Chr(9) + "Not in Original Text")
addi = addi + 1
err = err + 1
CountC = CountC + 1
EndIf
Next j
AddGadgetItem(5, -1, "---------------------------------------------------------")
AddGadgetItem(5, -1, "")
EndProcedure
Procedure EndProcess() ; end process and final printing of the summary
AddGadgetItem(5, -1, "Words in Original: " + Chr(9) + Str(CountO) + Chr(9) + "Words in Corrected: " + Chr(9) + Str(CountC))
AddGadgetItem(5, -1, "")
AddGadgetItem(5, -1, Chr(9) + Chr(9) + "Perfect Matches : " + Str(perf) + " words")
AddGadgetItem(5, -1, Chr(9) + Chr(9) + "Imperfect Matches : " + Str(imperf) + " words")
AddGadgetItem(5, -1, Chr(9) + Chr(9) + "Repeat Matches : " + Str(rept) + " words")
AddGadgetItem(5, -1, Chr(9) + Chr(9) + "Removals : " + Str(rem) + " words")
AddGadgetItem(5, -1, Chr(9) + Chr(9) + "Additions : " + Str(addi) + " words")
AddGadgetItem(5, -1, "")
AddGadgetItem(5, -1, Chr(9) + Chr(9) + "Total Errors : " + Str(err) + " words")
Errorpercent.f = err*100/ii
AddGadgetItem(5, -1, Chr(9) + Chr(9) + "Error Percentage : " + StrF(Errorpercent, 2) + " %")
AddGadgetItem(5, -1, "")
AddGadgetItem(5, -1, Chr(9) + Chr(9) + "--- End ---")
EndProcedure
Procedure Initialize() ; initialises the variables on "Initiate" as well as on "Clear"
ORIG = "": CORR = ""
perf=0: imperf=0: rept=0: rem=0: addi=0: err=0: Pj=0: Flag = 0
LenORIG=0: LenCORR=0: LenORIGStat=0: LenCORRStat=0: Correxistence=0
Errorpercent=0: CountO = 0: CountC = 0
ReDim O(100) : ReDim C(100): ReDim P(100): ReDim Q(100)
For i = 1 To 100
P(i) = "": Q(i) = ""
Next i
ClearGadgetItems(5)
EndProcedure