Page 1 of 2
MD5FingerPrint in Unicode
Posted: Sat Jan 03, 2015 4:48 am
by PB Fanatic
This code works fine in Ascii:
Code: Select all
a$="test" : Debug MD5Fingerprint(@a$,Len(a$)) ; 098f6bcd4621d373cade4e832627b4f6
And the result perfectly matches what
http://www.md5.cz says.
But if I switch the compiler to Unicode, the result is incorrect: 84afc5c978db956e578615db0f111ed4
I know there's procedures in these forums to show how to "fix" it, but really the command itself should do it internally, because I think it's important for the MD5FingerPrint() command to ALWAYS output the result to match what the website above says, regardless of the compiler setting. Otherwise there's no point in even having an MD5FingerPrint() command, if we're just going to wrap it in a "fix" procedure, right?
Re: MD5FingerPrint in Unicode
Posted: Sat Jan 03, 2015 6:44 am
by wilbert
The website you are mentioning returns a md5 code for a string.
PureBasic returns a md5 code for a memory buffer.
When compiling in Ascii mode these happen to be the same but you can't expect the result to be the same in unicode mode.
All you are doing is passing a memory pointer to the MD5Fingerprint procedure. How should the procedure know you are passing a unicode string ?
Re: MD5FingerPrint in Unicode
Posted: Sat Jan 03, 2015 7:34 am
by PB Fanatic
If a human puts text into that website, it outputs a known MD5 hash for it. I would assume the MD5FingerPrint() command would return the same human-expected result.
Re: MD5FingerPrint in Unicode
Posted: Sat Jan 03, 2015 8:27 am
by wilbert
PB Fanatic wrote:I would assume the MD5FingerPrint() command would return the same human-expected result.
It would if the procedure would be MD5Fingerprint(String.s) but the procedure is MD5Fingerprint(*Buffer, Size) .
What it currently returns might not be what you assumed but it does exactly what the help file says; generating a md5 hash from a memory buffer.
Re: MD5FingerPrint in Unicode
Posted: Sat Jan 03, 2015 8:39 am
by PB Fanatic
True that, but the manual has this example:
Code: Select all
; Example: string as memory buffer
test.s = "This is a test string!"
Debug MD5Fingerprint(@test, StringByteLength(test))
Which produces two different results, depending on the compiler state. But it should return the SAME result, because we're testing a string, just as you said.
Re: MD5FingerPrint in Unicode
Posted: Sat Jan 03, 2015 9:20 am
by normeus
@PB Fanatic
you are not testing a string, you are sending a memory location to MD5 function which in ascii is single bytes but unicode is double bytes.
assuming that you are only interested in the fact that your MD5 should match MD5 from PHP ( or websites or ascii )
This should work for you:
Code: Select all
Procedure.s md5ascii( s.s) ; create single byte text out of unicode , no error check
Protected i,mbuf
mbuf= AllocateMemory(StringByteLength(s))
For i = 0 To Len(s)-1
PokeB(mbuf+i,Asc(Mid(s,i+1,1)))
Next
PokeB(mbuf+Len(s),0)
ProcedureReturn MD5Fingerprint(mbuf,Len(s))
EndProcedure
a$="test" : Debug MD5Fingerprint(@a$,StringByteLength(a$)) ; 098f6bcd4621d373cade4e832627b4f6
Debug md5ascii(a$)
just remember, text$ is a variable for a string , @text$ is a pointer to a place in memory
Norm.
Re: MD5FingerPrint in Unicode
Posted: Sat Jan 03, 2015 9:50 am
by PB Fanatic
normeus wrote:assuming that you are only interested in the fact that your MD5 should match MD5 from PHP
That's exactly my goal, actually. My website returns an MD5 hash of a string that I pass to it by PHP, and it didn't match what PureBasic was giving me with MD5FingerPrint(). Now I see I will have to wrap a procedure around the MD5FingerPrint() command to get the same result, which is a shame. I wish PureBasic just had a simple straightforward MD5 command like in PHP with no concern of Ascii vs Unicode and memory poking.
Re: MD5FingerPrint in Unicode
Posted: Sat Jan 03, 2015 10:17 am
by Bisonte
Code: Select all
EnableExplicit
Procedure.s md5(String.s)
Protected *Buffer, Result.s = ""
If String <> ""
*Buffer = AllocateMemory(StringByteLength(String, #PB_Ascii))
If *Buffer
PokeS(*Buffer, String, StringByteLength(String, #PB_Ascii), #PB_Ascii)
Result = MD5Fingerprint(*Buffer, StringByteLength(String, #PB_Ascii))
FreeMemory(*Buffer)
EndIf
EndIf
ProcedureReturn Result
EndProcedure
Define String.s = "The quick brown fox â jumps over the lazy dog."
;: PHP function generates -> 66e7a64aef34d4148d8bde4aa2976ab9
Debug "PHP -> 66e7a64aef34d4148d8bde4aa2976ab9"
Debug "PB -> " + md5(String)
Check it with compilerswitch ASCII and Unicode...
Edit: And very important: PHP don't "integrate" the 0 Byte at the end of a string to calculate MD5 Hashes, so this is also without 0 Byte...
Re: MD5FingerPrint in Unicode
Posted: Sat Jan 03, 2015 10:22 am
by wilbert
Small procedure
Code: Select all
Procedure.s MD5AsciiFingerprint(s.s)
Protected a.s=s:ProcedureReturn MD5Fingerprint(@a,PokeS(@a,s,-1,#PB_Ascii))
EndProcedure
Similar but with format specification like STARGÅTE posted below
Code: Select all
Procedure.s md5(s.s, fmt = #PB_UTF8)
Protected Dim a.a(StringByteLength(s,fmt)+1):ProcedureReturn MD5Fingerprint(@a(),PokeS(@a(),s,-1,fmt))
EndProcedure
I wish PureBasic just had a simple straightforward MD5 command like in PHP with no concern of Ascii vs Unicode and memory poking.
There's lots of users that depend on the support of unicode because their language contains characters not in the ascii range.
If only ascii is supported, this is a problem. If unicode is required, there's the possibility of generating a hash based on UCS-2 or UTF-8.
The PB command is very flexible and allows the user to make a choice.
Re: MD5FingerPrint in Unicode
Posted: Sat Jan 03, 2015 12:59 pm
by STARGÅTE
PB Fanatic wrote:I wish PureBasic just had a simple straightforward MD5 command like in PHP with no concern of Ascii vs Unicode and memory poking.
In php you have the same problem, if you change your file format from ascii to utf8:
gets 16114a0b3232bc9a8f978311387e74f2 if your file is utf8, or d1c5faac7b530be151406b478f36bfb1 if it is in ascii.
Here is my function for MD5 with an optional Flag to define the format.
Code: Select all
Procedure.s MD5(String.s, Flags.i=#PB_UTF8)
Protected Length.i = StringByteLength(String, Flags)
Protected *Buffer = AllocateMemory(Length)
Protected Result.s
PokeS(*Buffer, String, #PB_Default, Flags|#PB_String_NoZero)
Result = MD5Fingerprint(*Buffer, Length)
FreeMemory(*Buffer)
ProcedureReturn Result
EndProcedure
Debug MD5("Äpfel", #PB_Ascii)
Debug MD5("Äpfel", #PB_UTF8)
Debug MD5("Äpfel", #PB_Unicode)
Re: MD5FingerPrint in Unicode
Posted: Sat Jan 03, 2015 1:19 pm
by PB Fanatic
wilbert wrote:Small procedure
Code: Select all
Procedure.s MD5AsciiFingerprint(s.s)
Protected a.s=s:ProcedureReturn MD5Fingerprint(@a,PokeS(@a,s,-1,#PB_Ascii))
EndProcedure
I like this! Thanks!

Re: MD5FingerPrint in Unicode
Posted: Sun Jan 04, 2015 7:27 pm
by Little John
wilbert wrote:
Similar but with format specification like STARGÅTE posted below
Code: Select all
Procedure.s md5(s.s, fmt = #PB_UTF8)
Dim a.a(StringByteLength(s,fmt)+1):ProcedureReturn MD5Fingerprint(@a(),PokeS(@a(),s,-1,fmt))
EndProcedure
Hello wilbert, that's nice.

Thank you! I'll put that code into my private string library.
However, the code contains a glitch that can cause an unwanted effect in a program that contains a Global Array a.a():
Code: Select all
Global Dim a.a(2)
Procedure.s md5(s.s, fmt = #PB_UTF8)
Dim a.a(StringByteLength(s,fmt)+1):ProcedureReturn MD5Fingerprint(@a(),PokeS(@a(),s,-1,fmt))
EndProcedure
For i = 0 To ArraySize(a())
a(i) = i
Debug a(i)
Next
Debug ""
Debug md5("Äpfel")
Debug ""
For i = 0 To ArraySize(a())
Debug a(i)
Next
So it's better to use
in the Procedure.
wilbert wrote:
There's lots of users that depend on the support of unicode because their language contains characters not in the ascii range.
If only ascii is supported, this is a problem. If unicode is required, there's the possibility of generating a hash based on UCS-2 or UTF-8.
The PB command is very flexible and allows the user to make a choice.
I know this was your reply to PB Fanatic, and all that you wrote is true, of course.
Howevr, IMHO PB's built-in MD5Fingerprint() function just should provide the option to pass a format parameter.
Then it wouldn't be necessary for us to write a wrapper in order to get this
important option.
( But this is a feature request by me. I can't see a bug here. )
Thanks again!
Re: MD5FingerPrint in Unicode
Posted: Sun Jan 04, 2015 7:48 pm
by wilbert
Little John wrote:So it's better to use
in the Procedure.
Thanks for mentioning. I changed the procedure in my post above.
The help file states that a Dim is always local. It wasn't clear to me that it can conflict with a global array.
You are right that an optional parameter would be best.
If it could be set to Ascii, Unicode, UTF8 or Binary and Binary would be the default, it wouldn't break backward compatibility and be a useful addition.
Re: MD5FingerPrint in Unicode
Posted: Mon Jan 05, 2015 9:22 am
by PB Fanatic
Little John wrote:Howevr, IMHO PB's built-in MD5Fingerprint() function just should provide the option to pass a format parameter.
Then it wouldn't be necessary for us to write a wrapper in order to get this important option.
Yes indeed, that's what I'd like to see, too.
Re: MD5FingerPrint in Unicode
Posted: Mon Jan 05, 2015 4:41 pm
by STARGÅTE
Little John wrote:Howevr, IMHO PB's built-in MD5Fingerprint() function just should provide the option to pass a format parameter.
Then it wouldn't be necessary for us to write a wrapper in order to get this important option.
( But this is a feature request by me. I can't see a bug here. )
The MD5Fingerprint() is a function for a
memory buffer (not directly for strings!).
So, it is not the job of this
memory function to "change" the format of the buffer.
This is also a rule for all other functions such as:
CRC32Fingerprint, SHA1Fingerprint, AESDecoder, AESEncoder, Base64Decoder, Base64Encoder and so on.
All this functions are memory functions, and it is the job of the user to convert a string to a buffer with PokeS() and not with @String.
It's a bad habit to use strings as memory buffer.