Base64 chinese letters

Just starting out? Need help? Post your questions and find answers here.
oftit
User
User
Posts: 52
Joined: Sun Apr 25, 2010 5:08 am

Base64 chinese letters

Post by oftit »

Hello all

I've been trying to read an mail that has been encoded i base64. When I read the mail with Thunderbird it reads the whole thing correctly. The mail contents ascii characters and chinese letters (unicode i guess). But when I try to read the whole mail the chinese letters are converted to ASCII character. I mean the ascii characters in the mail comes out correctly, but the chinese letters comes out as ascii letters?

I've been trying to compile the program in unicode, but this only shows chinese letters the whole way.

So I've been trying to examine the whole thing myself without any mail and all that. So I started to encode:

Code: Select all

Hello my name is 洒
To:

Code: Select all

SGVsbG8gbXkgbmFtZSBpcyA/
Then I decoded the above Base64 code in nonunicode compiling, and I got this:

Code: Select all

Hello my name is ?
The chinese letters converted to ascii?
This is the code that i compiled:

Code: Select all

out$ = Space(5000)
str$ = "SGVsbG8gbXkgbmFtZSBpcyA/"
Base64Decoder(@str$, StringByteLength(str$), @out$, 5000)
SetClipboardText(out$)
Thank you
Trond
Always Here
Always Here
Posts: 7446
Joined: Mon Sep 22, 2003 6:45 pm
Location: Norway

Re: Base64 chinese letters

Post by Trond »

Then I decoded the above Base64 code in nonunicode compiling, and I got this:
To store chinese letters you need to use unicode.
oftit
User
User
Posts: 52
Joined: Sun Apr 25, 2010 5:08 am

Re: Base64 chinese letters

Post by oftit »

Ok. But the mail was compiled in ASCII, but has some unicode characters in it. So when I compile the program in nonunicode the program will decode the mail without the unicode letters. But some place in the encoded text, there has to be used two bytes so the unicode can appear. So my question is - when do you know that you will have to use the two bytes?
Trond
Always Here
Always Here
Posts: 7446
Joined: Mon Sep 22, 2003 6:45 pm
Location: Norway

Re: Base64 chinese letters

Post by Trond »

Ok. But the mail was compiled in ASCII, but has some unicode characters in it.
That's not possible. The mail must have been encoded with UTF-8, which is a kind of unicode.

Compile your program in unicode and PeekS(*Base64DecodedBuffer, -1, #PB_UTF8) to get the string.
oftit
User
User
Posts: 52
Joined: Sun Apr 25, 2010 5:08 am

Re: Base64 chinese letters

Post by oftit »

Oh ok. I tried it with this code:

Code: Select all

out$ = Space(5000)
Base64Decoder(@str$, StringByteLength(str$), @out$, 5000)
MessageRequester("", PeekS(@Out$, -1, #PB_UTF8))
When I compile it with unicode turned on I get strange signs. When I compile it in nonunicode I get the right letters, but the chinese letters has been changed to questionmarks - "?"
Trond
Always Here
Always Here
Posts: 7446
Joined: Mon Sep 22, 2003 6:45 pm
Location: Norway

Re: Base64 chinese letters

Post by Trond »

You're doing it wrong. If the original string is in UTF-8, you can't put it into a pb string variable, as that will be either UCS2 (in unicode mode) or ascii.
Also the base64-encoded string can't be stored in a PB string when you compile in unicode mode (and you have to compile in unicode mode).
oftit
User
User
Posts: 52
Joined: Sun Apr 25, 2010 5:08 am

Re: Base64 chinese letters

Post by oftit »

Ok. The string is in UTF-8. But I really still don't know how?

Could you please post a code on how to do it?
Trond
Always Here
Always Here
Posts: 7446
Joined: Mon Sep 22, 2003 6:45 pm
Location: Norway

Re: Base64 chinese letters

Post by Trond »

Code: Select all

; Unicode must be ON, and File -> File Format must be set to UTF-8.

; Encode:
OriginalString.s = "Hello my name is 洒"
L = StringByteLength(OriginalString, #PB_UTF8)
*Utf8String = AllocateMemory(L+2)
PokeS(*Utf8String, OriginalString, -1, #PB_UTF8)
L2 = L*1.35 + 64
*BaseEncoded = AllocateMemory(L2)
EncodedLength = Base64Encoder(*Utf8String, L, *BaseEncoded, L2)

; You can also fill *BaseEncoded with base-encoded data from a file
; by first allocating enough memory and then using ReadData()

; We now have the encoded string ascii format in *BaseEncoded
; But we can't display it directly because our program is compiled in unicode mode
; WRONG: Debug PeekS(*BaseEncoded)
Debug PeekS(*BaseEncoded, -1, #PB_Ascii)


; Decode *BaseEncoded
*BaseDecoded = AllocateMemory(L2)
Base64Decoder(*BaseEncoded, EncodedLength, *BaseDecoded, EncodedLength)

; We now have the decoded string in the format it was originally in
; in *BaseDecoded. In our case, the original format was UTF-8.
Decoded.s = PeekS(*BaseDecoded, -1, #PB_UTF8)
Debug Decoded ; May still show ? if you set the debug output font to a font without chinese characters

oftit
User
User
Posts: 52
Joined: Sun Apr 25, 2010 5:08 am

Re: Base64 chinese letters

Post by oftit »

YEEEEEES. Trond you are the man :D
Post Reply