Page 1 of 1

Using Map as removing duplicate lines(?)

Posted: Fri Jun 10, 2022 6:16 pm
by AZJIO
1. Are there any restrictions when using Map for key length? Is it possible to use Map for an algorithm to remove duplicate rows? As I understand it, the map creates a binary representation of the string and therefore there should be no restriction.

2. Is Map an efficient algorithm for finding duplicate rows? I thought that if we make a structure that will have the length of a string, sort by the value of the string length, and then look for a duplicate only among strings of the same length.

Re: Using Map as removing duplicate lines(?)

Posted: Fri Jun 10, 2022 6:49 pm
by skywalk
Maps are not as efficient as a binary compare since you have the hash creation step.
But, I do use Maps for low speed removal of duplicates. My string data not being large. Like file paths and short lists of labels.

Re: Using Map as removing duplicate lines(?)

Posted: Fri Jun 10, 2022 7:01 pm
by AZJIO
The lack of case sensitivity leads to the idea that a binary comparison is taking place.
To make case sensitive option I want to try CompareMemoryString.

Re: Using Map as removing duplicate lines(?)

Posted: Fri Jun 10, 2022 7:16 pm
by Demivec
You can use the Map as a first step and then compare the strings at that same key for exact matches with CompareMemoryString() if there is any doubt.

IMHO this should be faster than doing a binary compare of each string to every other string assuming there are more than a small number of strings. If there are a small number of strings or doesn't matter which method you use.

Re: Using Map as removing duplicate lines(?)

Posted: Fri Jun 10, 2022 7:31 pm
by AZJIO
Before trying, I decided to look for "CompareMemoryString duplicate", there is already a result.
I have already made a program using Map.

Re: Using Map as removing duplicate lines(?)

Posted: Fri Jun 10, 2022 7:31 pm
by skywalk
What do you mean? Hash maps are case sensitive!
If you want case insensitive, do LCASE("mystring").

Re: Using Map as removing duplicate lines(?)

Posted: Fri Jun 10, 2022 7:39 pm
by AZJIO
In this case, I need to create a key in lowercase, and assign the original string to the key value so as not to lose the real string. Although this will be the original of the first line found, others may look different.

Re: Using Map as removing duplicate lines(?)

Posted: Sat Jun 11, 2022 10:59 am
by idle
Not enough information to answer. Maybe you want a trie instead
https://www.purebasic.fr/english/viewtopic.php?t=75783