Using Map as removing duplicate lines(?)

Just starting out? Need help? Post your questions and find answers here.
AZJIO
Addict
Addict
Posts: 2230
Joined: Sun May 14, 2017 1:48 am

Using Map as removing duplicate lines(?)

Post by AZJIO »

1. Are there any restrictions when using Map for key length? Is it possible to use Map for an algorithm to remove duplicate rows? As I understand it, the map creates a binary representation of the string and therefore there should be no restriction.

2. Is Map an efficient algorithm for finding duplicate rows? I thought that if we make a structure that will have the length of a string, sort by the value of the string length, and then look for a duplicate only among strings of the same length.
User avatar
skywalk
Addict
Addict
Posts: 4242
Joined: Wed Dec 23, 2009 10:14 pm
Location: Boston, MA

Re: Using Map as removing duplicate lines(?)

Post by skywalk »

Maps are not as efficient as a binary compare since you have the hash creation step.
But, I do use Maps for low speed removal of duplicates. My string data not being large. Like file paths and short lists of labels.
The nice thing about standards is there are so many to choose from. ~ Andrew Tanenbaum
AZJIO
Addict
Addict
Posts: 2230
Joined: Sun May 14, 2017 1:48 am

Re: Using Map as removing duplicate lines(?)

Post by AZJIO »

The lack of case sensitivity leads to the idea that a binary comparison is taking place.
To make case sensitive option I want to try CompareMemoryString.
User avatar
Demivec
Addict
Addict
Posts: 4282
Joined: Mon Jul 25, 2005 3:51 pm
Location: Utah, USA

Re: Using Map as removing duplicate lines(?)

Post by Demivec »

You can use the Map as a first step and then compare the strings at that same key for exact matches with CompareMemoryString() if there is any doubt.

IMHO this should be faster than doing a binary compare of each string to every other string assuming there are more than a small number of strings. If there are a small number of strings or doesn't matter which method you use.
AZJIO
Addict
Addict
Posts: 2230
Joined: Sun May 14, 2017 1:48 am

Re: Using Map as removing duplicate lines(?)

Post by AZJIO »

Before trying, I decided to look for "CompareMemoryString duplicate", there is already a result.
I have already made a program using Map.
User avatar
skywalk
Addict
Addict
Posts: 4242
Joined: Wed Dec 23, 2009 10:14 pm
Location: Boston, MA

Re: Using Map as removing duplicate lines(?)

Post by skywalk »

What do you mean? Hash maps are case sensitive!
If you want case insensitive, do LCASE("mystring").
The nice thing about standards is there are so many to choose from. ~ Andrew Tanenbaum
AZJIO
Addict
Addict
Posts: 2230
Joined: Sun May 14, 2017 1:48 am

Re: Using Map as removing duplicate lines(?)

Post by AZJIO »

In this case, I need to create a key in lowercase, and assign the original string to the key value so as not to lose the real string. Although this will be the original of the first line found, others may look different.
User avatar
idle
Always Here
Always Here
Posts: 6048
Joined: Fri Sep 21, 2007 5:52 am
Location: New Zealand

Re: Using Map as removing duplicate lines(?)

Post by idle »

Not enough information to answer. Maybe you want a trie instead
https://www.purebasic.fr/english/viewtopic.php?t=75783
Post Reply