MD5 for text [Resolved]

Just starting out? Need help? Post your questions and find answers here.
AZJIO
Addict
Addict
Posts: 2238
Joined: Sun May 14, 2017 1:48 am

Re: MD5 for text [Resolved]

Post by AZJIO »

A letter consists of 16 bits. 6 letters consist of 96 bits. Now we need to shift and change each bit inside the 32-bit long type. 96/32 = 3, which means that we will overwrite each bit in "Long" 3 times.
User avatar
Kwai chang caine
Always Here
Always Here
Posts: 5517
Joined: Sun Nov 05, 2006 11:42 pm
Location: Lyon - France

Re: MD5 for text [Resolved]

Post by Kwai chang caine »

NicTheQuick wrote: Mon Nov 17, 2025 1:36 pm Be aware that hashes like MD5 and especially CRC32 (because of its length) can have a lot of collisions.
Incredible !!!! :shock:
I've always been convinced that an MD5 hash couldn't produce the same result for two different sources, and that's precisely why it was created: to control the modification of a source "in passing."
But you've just proven that we can't trust Cypher to be truly certain of the "UNIQUE" aspect.
I'm very disappointed, i can't use it for create my UNIQUE reference :cry:

Code: Select all

UseCRC32Fingerprint()

Procedure.s CRC32_ASCII(input.s)
	Protected *ascii = Ascii(input)
	Protected hash.s = Fingerprint(*ascii, MemorySize(*ascii), #PB_Cipher_CRC32)
	FreeMemory(*ascii)
	ProcedureReturn hash
EndProcedure

Debug "CRC32_ASCII => " + CRC32_ASCII("gnu")
Debug "CRC32_ASCII => " + CRC32_ASCII("codding") + #CRLF$

Debug "StringFingerprint => " + StringFingerprint("gnu", #PB_Cipher_CRC32)
Debug "StringFingerprint => " + StringFingerprint("codding", #PB_Cipher_CRC32)
debugger wrote:CRC32_ASCII of ''gnu'' => 97b47b3f
CRC32_ASCII ''codding'' => 97b47b3f

StringFingerprint of ''gnu''=> 69c8c72d
StringFingerprint of ''codding'' => 69c8c72d
ImageThe happiness is a road...
Not a destination
User avatar
NicTheQuick
Addict
Addict
Posts: 1536
Joined: Sun Jun 22, 2003 7:43 pm
Location: Germany, Saarbrücken
Contact:

Re: MD5 for text [Resolved]

Post by NicTheQuick »

Kwai chang caine wrote: Wed Nov 19, 2025 8:09 pm
NicTheQuick wrote: Mon Nov 17, 2025 1:36 pm Be aware that hashes like MD5 and especially CRC32 (because of its length) can have a lot of collisions.
Incredible !!!! :shock:
I've always been convinced that an MD5 hash couldn't produce the same result for two different sources, and that's precisely why it was created: to control the modification of a source "in passing."
But you've just proven that we can't trust Cypher to be truly certain of the "UNIQUE" aspect.
I'm very disappointed, i can't use it for create my UNIQUE reference :cry:
Well, just because you were convinced of it doesn't make it true. It's clear by definition that a hash must have collisions. With some hashes, it's just harder to find them than with others. It's easy with CRC32 and MD5. You can find collisions pretty easily with normal consumer hardware.

And mathematically, it's completely logical. If a hash function creates 128-bit hashes, then data with 129 bits and more must generate collisions at the latest. The hash function can then no longer be bijective.

And another thing: hash functions are not ciphers.
The english grammar is freeware, you can use it freely - But it's not Open Source, i.e. you can not change it or publish it in altered way.
User avatar
Piero
Addict
Addict
Posts: 1102
Joined: Sat Apr 29, 2023 6:04 pm
Location: Italy

Re: MD5 for text [Resolved]

Post by Piero »

What about this?
Can it be a "solution"?
Just asking Gurus…

Code: Select all

UseCRC32Fingerprint()
Debug StringFingerprint("gnu", #PB_Cipher_CRC32)
Debug StringFingerprint("codding", #PB_Cipher_CRC32)
Debug StringFingerprint("gnu", #PB_Cipher_CRC32|#PB_Cipher_HMAC,256,#PB_UTF8, "password_", #PB_UTF8)
Debug StringFingerprint("codding", #PB_Cipher_CRC32|#PB_Cipher_HMAC,256,#PB_UTF8, "password_", #PB_UTF8)

; Results:
; 69c8c72d
; 69c8c72d
; 819d3ec0
; ee1a7cb0
User avatar
NicTheQuick
Addict
Addict
Posts: 1536
Joined: Sun Jun 22, 2003 7:43 pm
Location: Germany, Saarbrücken
Contact:

Re: MD5 for text [Resolved]

Post by NicTheQuick »

No.

I would suggest to use longer hashes. I don't understand why you only want to use small ones.
The english grammar is freeware, you can use it freely - But it's not Open Source, i.e. you can not change it or publish it in altered way.
User avatar
Piero
Addict
Addict
Posts: 1102
Joined: Sat Apr 29, 2023 6:04 pm
Location: Italy

Re: MD5 for text [Resolved]

Post by Piero »

NicTheQuick wrote: Thu Nov 20, 2025 2:09 pmI don't understand why you only want to use small ones.
I just wanted to make Kwai chang caine happy :oops:

PS:
…I also understand why he may want small hashes with low-probability conflicts… e.g., it can be useful for stuff like serial numbers registration for shareware apps… (just an example!)
User avatar
Kwai chang caine
Always Here
Always Here
Posts: 5517
Joined: Sun Nov 05, 2006 11:42 pm
Location: Lyon - France

Re: MD5 for text [Resolved]

Post by Kwai chang caine »

Thanks Pierro for your help, and happy to talk to you for the first time (i Believe) :wink:
Yes you have understand my goal 8)
NicTheQuick wrote:I don't understand why you only want to use small ones.
I have several thousand of mp3 (Like you all surely) :wink:
And i'm an old DJ of 80's (Surely not like you all) :lol: , and i have several softwares i create myself, for manage it, because i have also more several thousands of vinyls 45 rpm in my collection :oops:
When i works with this files, the titles can be a little bit different (French accent forgotten, comma added, typing error, additional informations, etc ...)
For example
Little brain" wrote: Cindy Lauper (She bop).mp3
Cindi Lauper (She bop).mp3
Then i say to me, i must create an UNIQUE reference for each title
So i create MD5 for each mp3 file and insert it in the title, but the problem it's the size of the MD5 text :shock:
For several titles, already very long, added this long text is not the better idea of the years :|

Then i have another idea, insert it in the mp3 TAG
But i see if i had the md5 inside the file, obviously the md5 change, so ....we're going in circles :D

Then i say to me, perhaps there is a method for have a more little text than the title for create a short reference one time for all, than i can easily put in addition to the title
aa8593cc - Cindy Lauper (She bop).mp3
ImageThe happiness is a road...
Not a destination
BarryG
Addict
Addict
Posts: 4251
Joined: Thu Apr 18, 2019 8:17 am

Re: MD5 for text [Resolved]

Post by BarryG »

Kwai chang caine wrote: Sat Nov 22, 2025 11:44 am
aa8593cc - Cindy Lauper (She bop).mp3
CRC32 on the filename will work perfect for that. My own app uses it for prefixing my clipboard text and I've never had a clash.
User avatar
Kwai chang caine
Always Here
Always Here
Posts: 5517
Joined: Sun Nov 05, 2006 11:42 pm
Location: Lyon - France

Re: MD5 for text [Resolved]

Post by Kwai chang caine »

Thanks BarryG for your advice 8)
Are you sure it's impossible to have the same CRC32 for two slightly differents texts ?

Or for text completely different like the NicTheQuick example ?
NicTheQuick wrote:CRC32_ASCII of ''gnu'' => 97b47b3f
CRC32_ASCII ''codding'' => 97b47b3f
ImageThe happiness is a road...
Not a destination
BarryG
Addict
Addict
Posts: 4251
Joined: Thu Apr 18, 2019 8:17 am

Re: MD5 for text [Resolved]

Post by BarryG »

Kwai chang caine wrote: Sat Nov 22, 2025 12:11 pmAre you sure it's impossible to have the same CRC32 for two slightly differents texts ?
It's NOT impossible; no (as NicTheQuick proved). But I work with long non-sensitive texts (not short strings), so maybe that's why I've never suffered a collision yet. I should update my code though.
User avatar
Kwai chang caine
Always Here
Always Here
Posts: 5517
Joined: Sun Nov 05, 2006 11:42 pm
Location: Lyon - France

Re: MD5 for text [Resolved]

Post by Kwai chang caine »

Ok thanks for your help 8)
ImageThe happiness is a road...
Not a destination
User avatar
NicTheQuick
Addict
Addict
Posts: 1536
Joined: Sun Jun 22, 2003 7:43 pm
Location: Germany, Saarbrücken
Contact:

Re: MD5 for text [Resolved]

Post by NicTheQuick »

Kwai chang caine wrote: Sat Nov 22, 2025 11:44 amWhen i works with this files, the titles can be a little bit different (French accent forgotten, comma added, typing error, additional informations, etc ...)
For example
Little brain" wrote: Cindy Lauper (She bop).mp3
Cindi Lauper (She bop).mp3
Then i say to me, i must create an UNIQUE reference for each title
Don't you want to find these similar tracks using a fuzzy search instead and only keep the better versions?

I still don't get why it helps to add a unique prefix. 🤔 Sorry if I'm a bit slow on the uptake.
The english grammar is freeware, you can use it freely - But it's not Open Source, i.e. you can not change it or publish it in altered way.
User avatar
Kwai chang caine
Always Here
Always Here
Posts: 5517
Joined: Sun Nov 05, 2006 11:42 pm
Location: Lyon - France

Re: MD5 for text [Resolved]

Post by Kwai chang caine »

I still don't get why it helps to add a unique prefix
I need a unique reference for each mp3 for synchronize all the softwares and effectively the search too
To be sure when i talk of the reference "M19569", it's the only one concerned and not another a little bit resembling
Before, i use a simple number reference i give myself, stupidly the order of the list in explorer at the beginning of the softwares
M00001 - 2 belgen (Lena ''Extended mix'')
M00002 - 2 brothers on the 4th floor (Can't help myself)
M00003 - 2 brothers on the 4th floor (Dreams)
....
...
M04233 - ZZ top (Woke up with wood)
But with this system, when i add a record obviouly i must add it at the end of the list, and i'm also forcing to manage the continuity of references, otherwise it makes no sense, so I have to fill in the holes when I delete titles.
And for use this method, i'm forcing to manage a database for having
Reference <==> Title
It's the reason why, like I'm currently rethinking my entire system, I thought that if I could create a reference that better matches to the title, and without a database, that would be much better :wink:
But apparently like usually, the things is never also simple that in my sick brain :lol:
ImageThe happiness is a road...
Not a destination
User avatar
Piero
Addict
Addict
Posts: 1102
Joined: Sat Apr 29, 2023 6:04 pm
Location: Italy

Re: MD5 for text [Resolved]

Post by Piero »

Kwai chang caine wrote: Sat Nov 22, 2025 11:44 am Thanks Pierro
Pierro?
Image
User avatar
Kwai chang caine
Always Here
Always Here
Posts: 5517
Joined: Sun Nov 05, 2006 11:42 pm
Location: Lyon - France

Re: MD5 for text [Resolved]

Post by Kwai chang caine »

Excuse me, just one extra R :wink:
I hope, you are not also sad too :mrgreen:
ImageThe happiness is a road...
Not a destination
Post Reply