MD5FingerPrint in Unicode

Just starting out? Need help? Post your questions and find answers here.
PB Fanatic
User
User
Posts: 49
Joined: Wed Dec 17, 2014 11:54 am

MD5FingerPrint in Unicode

Post by PB Fanatic »

This code works fine in Ascii:

Code: Select all

a$="test" : Debug MD5Fingerprint(@a$,Len(a$)) ; 098f6bcd4621d373cade4e832627b4f6
And the result perfectly matches what http://www.md5.cz says.

But if I switch the compiler to Unicode, the result is incorrect: 84afc5c978db956e578615db0f111ed4

I know there's procedures in these forums to show how to "fix" it, but really the command itself should do it internally, because I think it's important for the MD5FingerPrint() command to ALWAYS output the result to match what the website above says, regardless of the compiler setting. Otherwise there's no point in even having an MD5FingerPrint() command, if we're just going to wrap it in a "fix" procedure, right?
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3870
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: MD5FingerPrint in Unicode

Post by wilbert »

The website you are mentioning returns a md5 code for a string.
PureBasic returns a md5 code for a memory buffer.
When compiling in Ascii mode these happen to be the same but you can't expect the result to be the same in unicode mode.
All you are doing is passing a memory pointer to the MD5Fingerprint procedure. How should the procedure know you are passing a unicode string ?
Windows (x64)
Raspberry Pi OS (Arm64)
PB Fanatic
User
User
Posts: 49
Joined: Wed Dec 17, 2014 11:54 am

Re: MD5FingerPrint in Unicode

Post by PB Fanatic »

If a human puts text into that website, it outputs a known MD5 hash for it. I would assume the MD5FingerPrint() command would return the same human-expected result.
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3870
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: MD5FingerPrint in Unicode

Post by wilbert »

PB Fanatic wrote:I would assume the MD5FingerPrint() command would return the same human-expected result.
It would if the procedure would be MD5Fingerprint(String.s) but the procedure is MD5Fingerprint(*Buffer, Size) .
What it currently returns might not be what you assumed but it does exactly what the help file says; generating a md5 hash from a memory buffer.
Windows (x64)
Raspberry Pi OS (Arm64)
PB Fanatic
User
User
Posts: 49
Joined: Wed Dec 17, 2014 11:54 am

Re: MD5FingerPrint in Unicode

Post by PB Fanatic »

True that, but the manual has this example:

Code: Select all

; Example: string as memory buffer
test.s = "This is a test string!"
Debug MD5Fingerprint(@test, StringByteLength(test))
Which produces two different results, depending on the compiler state. But it should return the SAME result, because we're testing a string, just as you said.
normeus
Enthusiast
Enthusiast
Posts: 415
Joined: Fri Apr 20, 2012 8:09 pm
Contact:

Re: MD5FingerPrint in Unicode

Post by normeus »

@PB Fanatic
you are not testing a string, you are sending a memory location to MD5 function which in ascii is single bytes but unicode is double bytes.
assuming that you are only interested in the fact that your MD5 should match MD5 from PHP ( or websites or ascii )
This should work for you:

Code: Select all

Procedure.s md5ascii( s.s) ; create single byte text out of unicode , no error check
  Protected i,mbuf
  mbuf= AllocateMemory(StringByteLength(s)) 
  For i = 0 To Len(s)-1
    PokeB(mbuf+i,Asc(Mid(s,i+1,1)))
  Next
  PokeB(mbuf+Len(s),0)
  ProcedureReturn MD5Fingerprint(mbuf,Len(s))
  
EndProcedure

a$="test" : Debug MD5Fingerprint(@a$,StringByteLength(a$)) ; 098f6bcd4621d373cade4e832627b4f6

Debug md5ascii(a$)
just remember, text$ is a variable for a string , @text$ is a pointer to a place in memory

Norm.
google Translate;Makes my jokes fall flat- Fait mes blagues tombent à plat- Machte meine Witze verpuffen- Eh cumpari ci vo sunari
PB Fanatic
User
User
Posts: 49
Joined: Wed Dec 17, 2014 11:54 am

Re: MD5FingerPrint in Unicode

Post by PB Fanatic »

normeus wrote:assuming that you are only interested in the fact that your MD5 should match MD5 from PHP
That's exactly my goal, actually. My website returns an MD5 hash of a string that I pass to it by PHP, and it didn't match what PureBasic was giving me with MD5FingerPrint(). Now I see I will have to wrap a procedure around the MD5FingerPrint() command to get the same result, which is a shame. I wish PureBasic just had a simple straightforward MD5 command like in PHP with no concern of Ascii vs Unicode and memory poking.
User avatar
Bisonte
Addict
Addict
Posts: 1232
Joined: Tue Oct 09, 2007 2:15 am

Re: MD5FingerPrint in Unicode

Post by Bisonte »

Code: Select all

EnableExplicit

Procedure.s md5(String.s)
  
  Protected *Buffer, Result.s = ""
  
  If String <> ""
    *Buffer = AllocateMemory(StringByteLength(String, #PB_Ascii))
    If *Buffer
      PokeS(*Buffer, String, StringByteLength(String, #PB_Ascii), #PB_Ascii)
      Result = MD5Fingerprint(*Buffer, StringByteLength(String, #PB_Ascii))
      FreeMemory(*Buffer)  
    EndIf
  EndIf
  
  ProcedureReturn Result
  
EndProcedure

Define String.s = "The quick brown fox â jumps over the lazy dog."

;: PHP function generates -> 66e7a64aef34d4148d8bde4aa2976ab9

Debug "PHP -> 66e7a64aef34d4148d8bde4aa2976ab9"
Debug "PB  -> " + md5(String)
Check it with compilerswitch ASCII and Unicode...

Edit: And very important: PHP don't "integrate" the 0 Byte at the end of a string to calculate MD5 Hashes, so this is also without 0 Byte...
PureBasic 6.10 LTS (Windows x86/x64) | Windows10 Pro x64 | Asus TUF X570 Gaming Plus | R9 5900X | 64GB RAM | GeForce RTX 3080 TI iChill X4 | HAF XF Evo | build by vannicom​​
English is not my native language... (I often use DeepL to translate my texts.)
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3870
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: MD5FingerPrint in Unicode

Post by wilbert »

Small procedure

Code: Select all

Procedure.s MD5AsciiFingerprint(s.s)
  Protected a.s=s:ProcedureReturn MD5Fingerprint(@a,PokeS(@a,s,-1,#PB_Ascii))  
EndProcedure
Similar but with format specification like STARGÅTE posted below

Code: Select all

Procedure.s md5(s.s, fmt = #PB_UTF8)
  Protected Dim a.a(StringByteLength(s,fmt)+1):ProcedureReturn MD5Fingerprint(@a(),PokeS(@a(),s,-1,fmt))
EndProcedure
I wish PureBasic just had a simple straightforward MD5 command like in PHP with no concern of Ascii vs Unicode and memory poking.
There's lots of users that depend on the support of unicode because their language contains characters not in the ascii range.
If only ascii is supported, this is a problem. If unicode is required, there's the possibility of generating a hash based on UCS-2 or UTF-8.
The PB command is very flexible and allows the user to make a choice.
Last edited by wilbert on Sun Jan 04, 2015 7:44 pm, edited 2 times in total.
Windows (x64)
Raspberry Pi OS (Arm64)
User avatar
STARGÅTE
Addict
Addict
Posts: 2089
Joined: Thu Jan 10, 2008 1:30 pm
Location: Germany, Glienicke
Contact:

Re: MD5FingerPrint in Unicode

Post by STARGÅTE »

PB Fanatic wrote:I wish PureBasic just had a simple straightforward MD5 command like in PHP with no concern of Ascii vs Unicode and memory poking.
In php you have the same problem, if you change your file format from ascii to utf8:

Code: Select all

echo md5('Äpfel');
gets 16114a0b3232bc9a8f978311387e74f2 if your file is utf8, or d1c5faac7b530be151406b478f36bfb1 if it is in ascii.

Here is my function for MD5 with an optional Flag to define the format.

Code: Select all

Procedure.s MD5(String.s, Flags.i=#PB_UTF8)
  
  Protected Length.i = StringByteLength(String, Flags)
  Protected *Buffer  = AllocateMemory(Length)
  Protected Result.s
  
  PokeS(*Buffer, String, #PB_Default, Flags|#PB_String_NoZero)
  Result = MD5Fingerprint(*Buffer, Length)
  FreeMemory(*Buffer)
  
  ProcedureReturn Result
  
EndProcedure

Debug MD5("Äpfel", #PB_Ascii)
Debug MD5("Äpfel", #PB_UTF8)
Debug MD5("Äpfel", #PB_Unicode)
PB 6.01 ― Win 10, 21H2 ― Ryzen 9 3900X, 32 GB ― NVIDIA GeForce RTX 3080 ― Vivaldi 6.0 ― www.unionbytes.de
Lizard - Script language for symbolic calculations and moreTypeface - Sprite-based font include/module
PB Fanatic
User
User
Posts: 49
Joined: Wed Dec 17, 2014 11:54 am

Re: MD5FingerPrint in Unicode

Post by PB Fanatic »

wilbert wrote:Small procedure

Code: Select all

Procedure.s MD5AsciiFingerprint(s.s)
  Protected a.s=s:ProcedureReturn MD5Fingerprint(@a,PokeS(@a,s,-1,#PB_Ascii))  
EndProcedure
I like this! Thanks! :D
Little John
Addict
Addict
Posts: 4527
Joined: Thu Jun 07, 2007 3:25 pm
Location: Berlin, Germany

Re: MD5FingerPrint in Unicode

Post by Little John »

wilbert wrote: Similar but with format specification like STARGÅTE posted below

Code: Select all

Procedure.s md5(s.s, fmt = #PB_UTF8)
  Dim a.a(StringByteLength(s,fmt)+1):ProcedureReturn MD5Fingerprint(@a(),PokeS(@a(),s,-1,fmt))
EndProcedure
Hello wilbert, that's nice. :-) Thank you! I'll put that code into my private string library.

However, the code contains a glitch that can cause an unwanted effect in a program that contains a Global Array a.a():

Code: Select all

Global Dim a.a(2)

Procedure.s md5(s.s, fmt = #PB_UTF8)
  Dim a.a(StringByteLength(s,fmt)+1):ProcedureReturn MD5Fingerprint(@a(),PokeS(@a(),s,-1,fmt))
EndProcedure

For i = 0 To ArraySize(a())
   a(i) = i
   Debug a(i)
Next

Debug ""
Debug md5("Äpfel")
Debug ""

For i = 0 To ArraySize(a())
   Debug a(i)
Next
So it's better to use

Code: Select all

  Protected Dim ...
in the Procedure.
wilbert wrote: There's lots of users that depend on the support of unicode because their language contains characters not in the ascii range.
If only ascii is supported, this is a problem. If unicode is required, there's the possibility of generating a hash based on UCS-2 or UTF-8.
The PB command is very flexible and allows the user to make a choice.
I know this was your reply to PB Fanatic, and all that you wrote is true, of course.

Howevr, IMHO PB's built-in MD5Fingerprint() function just should provide the option to pass a format parameter.
Then it wouldn't be necessary for us to write a wrapper in order to get this important option.
( But this is a feature request by me. I can't see a bug here. )

Thanks again!
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3870
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: MD5FingerPrint in Unicode

Post by wilbert »

Little John wrote:So it's better to use

Code: Select all

  Protected Dim ...
in the Procedure.
Thanks for mentioning. I changed the procedure in my post above.
The help file states that a Dim is always local. It wasn't clear to me that it can conflict with a global array.

You are right that an optional parameter would be best.
If it could be set to Ascii, Unicode, UTF8 or Binary and Binary would be the default, it wouldn't break backward compatibility and be a useful addition.
Windows (x64)
Raspberry Pi OS (Arm64)
PB Fanatic
User
User
Posts: 49
Joined: Wed Dec 17, 2014 11:54 am

Re: MD5FingerPrint in Unicode

Post by PB Fanatic »

Little John wrote:Howevr, IMHO PB's built-in MD5Fingerprint() function just should provide the option to pass a format parameter.
Then it wouldn't be necessary for us to write a wrapper in order to get this important option.
Yes indeed, that's what I'd like to see, too.
User avatar
STARGÅTE
Addict
Addict
Posts: 2089
Joined: Thu Jan 10, 2008 1:30 pm
Location: Germany, Glienicke
Contact:

Re: MD5FingerPrint in Unicode

Post by STARGÅTE »

Little John wrote:Howevr, IMHO PB's built-in MD5Fingerprint() function just should provide the option to pass a format parameter.
Then it wouldn't be necessary for us to write a wrapper in order to get this important option.
( But this is a feature request by me. I can't see a bug here. )
The MD5Fingerprint() is a function for a memory buffer (not directly for strings!).
So, it is not the job of this memory function to "change" the format of the buffer.

This is also a rule for all other functions such as:
CRC32Fingerprint, SHA1Fingerprint, AESDecoder, AESEncoder, Base64Decoder, Base64Encoder and so on.
All this functions are memory functions, and it is the job of the user to convert a string to a buffer with PokeS() and not with @String.
It's a bad habit to use strings as memory buffer.
PB 6.01 ― Win 10, 21H2 ― Ryzen 9 3900X, 32 GB ― NVIDIA GeForce RTX 3080 ― Vivaldi 6.0 ― www.unionbytes.de
Lizard - Script language for symbolic calculations and moreTypeface - Sprite-based font include/module
Post Reply