Page 1 of 1

Base64 chinese letters

Posted: Sat Jan 29, 2011 3:50 pm
by oftit
Hello all

I've been trying to read an mail that has been encoded i base64. When I read the mail with Thunderbird it reads the whole thing correctly. The mail contents ascii characters and chinese letters (unicode i guess). But when I try to read the whole mail the chinese letters are converted to ASCII character. I mean the ascii characters in the mail comes out correctly, but the chinese letters comes out as ascii letters?

I've been trying to compile the program in unicode, but this only shows chinese letters the whole way.

So I've been trying to examine the whole thing myself without any mail and all that. So I started to encode:

Code: Select all

Hello my name is 洒
To:

Code: Select all

SGVsbG8gbXkgbmFtZSBpcyA/
Then I decoded the above Base64 code in nonunicode compiling, and I got this:

Code: Select all

Hello my name is ?
The chinese letters converted to ascii?
This is the code that i compiled:

Code: Select all

out$ = Space(5000)
str$ = "SGVsbG8gbXkgbmFtZSBpcyA/"
Base64Decoder(@str$, StringByteLength(str$), @out$, 5000)
SetClipboardText(out$)
Thank you

Re: Base64 chinese letters

Posted: Sat Jan 29, 2011 4:50 pm
by Trond
Then I decoded the above Base64 code in nonunicode compiling, and I got this:
To store chinese letters you need to use unicode.

Re: Base64 chinese letters

Posted: Sat Jan 29, 2011 11:56 pm
by oftit
Ok. But the mail was compiled in ASCII, but has some unicode characters in it. So when I compile the program in nonunicode the program will decode the mail without the unicode letters. But some place in the encoded text, there has to be used two bytes so the unicode can appear. So my question is - when do you know that you will have to use the two bytes?

Re: Base64 chinese letters

Posted: Sun Jan 30, 2011 11:07 am
by Trond
Ok. But the mail was compiled in ASCII, but has some unicode characters in it.
That's not possible. The mail must have been encoded with UTF-8, which is a kind of unicode.

Compile your program in unicode and PeekS(*Base64DecodedBuffer, -1, #PB_UTF8) to get the string.

Re: Base64 chinese letters

Posted: Sun Jan 30, 2011 2:48 pm
by oftit
Oh ok. I tried it with this code:

Code: Select all

out$ = Space(5000)
Base64Decoder(@str$, StringByteLength(str$), @out$, 5000)
MessageRequester("", PeekS(@Out$, -1, #PB_UTF8))
When I compile it with unicode turned on I get strange signs. When I compile it in nonunicode I get the right letters, but the chinese letters has been changed to questionmarks - "?"

Re: Base64 chinese letters

Posted: Sun Jan 30, 2011 4:02 pm
by Trond
You're doing it wrong. If the original string is in UTF-8, you can't put it into a pb string variable, as that will be either UCS2 (in unicode mode) or ascii.
Also the base64-encoded string can't be stored in a PB string when you compile in unicode mode (and you have to compile in unicode mode).

Re: Base64 chinese letters

Posted: Sun Jan 30, 2011 4:53 pm
by oftit
Ok. The string is in UTF-8. But I really still don't know how?

Could you please post a code on how to do it?

Re: Base64 chinese letters

Posted: Sun Jan 30, 2011 5:12 pm
by Trond

Code: Select all

; Unicode must be ON, and File -> File Format must be set to UTF-8.

; Encode:
OriginalString.s = "Hello my name is 洒"
L = StringByteLength(OriginalString, #PB_UTF8)
*Utf8String = AllocateMemory(L+2)
PokeS(*Utf8String, OriginalString, -1, #PB_UTF8)
L2 = L*1.35 + 64
*BaseEncoded = AllocateMemory(L2)
EncodedLength = Base64Encoder(*Utf8String, L, *BaseEncoded, L2)

; You can also fill *BaseEncoded with base-encoded data from a file
; by first allocating enough memory and then using ReadData()

; We now have the encoded string ascii format in *BaseEncoded
; But we can't display it directly because our program is compiled in unicode mode
; WRONG: Debug PeekS(*BaseEncoded)
Debug PeekS(*BaseEncoded, -1, #PB_Ascii)


; Decode *BaseEncoded
*BaseDecoded = AllocateMemory(L2)
Base64Decoder(*BaseEncoded, EncodedLength, *BaseDecoded, EncodedLength)

; We now have the decoded string in the format it was originally in
; in *BaseDecoded. In our case, the original format was UTF-8.
Decoded.s = PeekS(*BaseDecoded, -1, #PB_UTF8)
Debug Decoded ; May still show ? if you set the debug output font to a font without chinese characters


Re: Base64 chinese letters

Posted: Sun Jan 30, 2011 5:30 pm
by oftit
YEEEEEES. Trond you are the man :D