Page 2 of 2
Re: Remove all duplicate lines from file
Posted: Sun Feb 21, 2021 4:09 pm
by JaxMusic
Why not drop into a database table? You can easily create a temp table, insert the data, and retrieve the unique list, sorted if you wish:
Code: Select all
Select distinct name from tmpname order by name
With Pure Basics DB library, this should be much fewer lines of code and more clear what is being done. Of course, it never hurts to look at the tools PB provides, but databases are made to solve problems like these
Re: Remove all duplicate lines from file
Posted: Sun Feb 21, 2021 7:29 pm
by Paul
Remember.... the original post wanted the results to show:
Tom
Barbara
Tim
Antonia
Many examples posted here have "Frederic" and "Antonio" in the results which would be incorrect.
According to original post, if there is a duplicate name then the name must not appear in the list at all.
Here's an example using Lists (assuming the data is in a file called data.txt)
Code: Select all
NewList dat.s()
hFile=ReadFile(#PB_Any,"data.txt")
If hFile
While Eof(hFile)=0
AddElement(dat())
dat()=ReadString(hFile)
Wend
CloseFile(hFile)
ResetList(dat())
While NextElement(dat())
*old=@dat()
cur$=dat()
found=0
ForEach dat()
If dat()=cur$
found+1
EndIf
Next
If found>1
ForEach dat()
If dat()=cur$
DeleteElement(dat())
EndIf
Next
EndIf
ChangeCurrentElement(dat(), *old)
Wend
EndIf
ForEach dat()
Debug dat()
Next
Re: Remove all duplicate lines from file
Posted: Mon Feb 22, 2021 1:14 am
by Demivec
@Paul: Your code solution has a error in its implementation.
You record the address of the current list element in *old and its contents in cur$ then you search the entire list to count the number of elements with whose contents match cur$. If there is more than one you then go through the entire list again and delete all the elements that match cur$ including the original one whose address you saved in *old. You then change the current list element to be the one pointed to by *old. If you had just finished checking for an element that was duplicated then *old doesn't point to a valid element because you deleted it.
One possible correction is to keep track of the previous element instead of the current element.
Re: Remove all duplicate lines from file
Posted: Mon Feb 22, 2021 4:37 am
by Paul
Demivec wrote:@Paul: Your code solution has a error in its implementation.
Ok, so provide a list of data which causes this to fail

Re: Remove all duplicate lines from file
Posted: Mon Feb 22, 2021 11:43 pm
by Demivec
@Paul: Well I have to admit I could not come up with a list of data that causes it to fail. It does not fail ... yet.
@Edit: removed documentation of flawed code
Re: Remove all duplicate lines from file
Posted: Tue Feb 23, 2021 2:37 am
by Paul
So key takeaway...
@Paul: Well I have to admit I could not come up with a list of data that causes it to fail. It does not fail...

Re: Remove all duplicate lines from file
Posted: Wed Feb 24, 2021 6:26 am
by Keya
Depending on how large the file was, I would store say n-bit hashes for each line. If it was a tiny file I'd use 8-bit hashes (although you dont really need it), if it was a large file I'd use 24 or 32-bit hashes
Re: Remove all duplicate lines from file
Posted: Wed Feb 24, 2021 1:29 pm
by kenmo
Here's my contribution
If the names are in a file, just do two passes over the file:
Code: Select all
NewMap Count.i()
ReadFile(0, "names.txt")
While Not Eof(0)
Name.s = ReadString(0)
Count(Name) + 1
Wend
FileSeek(0, 0)
While Not Eof(0)
Name.s = ReadString(0)
If Count(Name) = 1
Debug Name
EndIf
Wend
CloseFile(0)
Of course, this quick example doesn't handle file errors, blank lines, or case-insensitive. Easy to add.