Page 1 of 1
Switching my app to Unicode breaks things
Posted: Mon Sep 26, 2011 10:50 pm
by MachineCode
Okay, so I enabled compiling to Unicode for one of my apps, but now it doesn't work. For example, one part uses the following procedure, which returns the HTML of a web page into a variable. In ASCII, it works. In Unicode, it returns a string like "?????????????????????????". Great. Anyone know how to fix this? And does it mean all my string procedures are now going to be broken because of Unicode?
Code: Select all
Procedure.s DownloadHTML(url$)
#INTERNET_FLAG_RELOAD=$80000000
hInet=InternetOpen_(url$,1,0,0,0)
If hInet
hURL=InternetOpenUrl_(hInet,url$,0,0,#INTERNET_FLAG_RELOAD,0)
If hURL
html$=Space(256)
If InternetReadFile_(hURL,@html$,Len(html$),@bytes)
html$=Trim(html$)
EndIf
EndIf
InternetCloseHandle_(hInet)
EndIf
ProcedureReturn html$
EndProcedure
Re: Switching my app to Unicode breaks things
Posted: Mon Sep 26, 2011 11:24 pm
by Shield
You need to convert the string from ASCII to unicode in order to read it properly in a PB program that runs on unicode.
Just a quick hack here to demonstrate this:
Code: Select all
Procedure.s DownloadHTML(url$)
#INTERNET_FLAG_RELOAD=$80000000
hInet=InternetOpen_(url$,1,0,0,0)
If hInet
hURL=InternetOpenUrl_(hInet,url$,0,0,#INTERNET_FLAG_RELOAD,0)
If hURL
*buffer = AllocateMemory(256)
If InternetReadFile_(hURL,*buffer,MemorySize(*buffer),@bytes)
html$ = PeekS(*buffer, MemorySize(*buffer), #PB_Ascii)
EndIf
EndIf
InternetCloseHandle_(hInet)
EndIf
ProcedureReturn html$
EndProcedure
Debug DownloadHTML("http://www.google.com/")
You need to do this because the API just writes into memory what it gets. It doesn't care about the encoding,
you need to take care of this yourself.

Re: Switching my app to Unicode breaks things
Posted: Mon Sep 26, 2011 11:46 pm
by happer66
Another take, convert the string to ascii
Code: Select all
Procedure.s DownloadHTML(url$)
<removed> pasted the wrong code!
Unicode characters are 2 bytes when stored in memory, when you want to count the number of characters you actually see you can use Len(), but when you want to know it's memory size you have to use StringByteLength() or Len(string$)*SizeOf(Character). Don't forget that the ending NUL is also the size of a character. Hope that made somewhat sense
Otherwise I'm sure someone will correct it/me.
edit: Almost forgot, this is quite good
http://www.joelonsoftware.com/articles/Unicode.html
Re: Switching my app to Unicode breaks things
Posted: Tue Sep 27, 2011 10:50 am
by MachineCode
Shield wrote:html$ = PeekS(*buffer, MemorySize(*buffer), #PB_Ascii)
Ewwwwww. I need to PeekS() everything now? Can't the compiler just do this for me if I've set Unicode on in Compiler Options?
Re: Switching my app to Unicode breaks things
Posted: Tue Sep 27, 2011 11:06 am
by luis
If you set the compiler to unicode and you are reading data from a source where it's not stored in unicode it's your job to know it and convert it. Could be not only the output of a procedure, but some external data stored in that way that you just read in memory. You certainly cannot ask the compiler to inspect data at runtime every time you do a memory access and try to imagine if you want to perform some kind of conversion in-between. Excluding the cost of this, sometimes you could want it, other times not.
Other times data can resemble one thing when in reality it's another thing or should be treated in a certain way irrespectively of how it looks like.
To explicitly tell the compiler to do some of these conversion when calling a foreign procedure, you have prototypes and pseudotypes. When you cannot use them, you have to use peeks().
Re: Switching my app to Unicode breaks things
Posted: Tue Sep 27, 2011 2:41 pm
by MachineCode
Thanks for the explanation. I'll keep compiling in Ascii then, as I have no time for Unicode right now, and don't know what data will be Unicode and not (I can't tell, I'm a Unicode newbie, and Joel's article doesn't help). Maybe later when I have more time, experience and money.