Page 1 of 1

WriteStringN, ReadData, unicode

Posted: Fri May 26, 2006 8:36 am
by Dare2
I need even more educating with Unicode.

(Note: Currently all my programs are compiled with the "Create unicode executable" checked.)


This example code, without the #PB_Unicode flag on each WriteLineN line causes a later ReadData to misread the file. (ReadData and WriteData appear to stick with whatever mode the "Create unicode executable" flag defines).

Code: Select all

CreateFile(1,"C:\100\pb400\APPLICATIONS\IVMGlobal\data\lang_English.txt")
WriteStringN(1,"[LOGIN]")
WriteStringN(1,"?username=YOUR USER NAME")
WriteStringN(1,"?userpass=YOUR PASSWORD")
WriteStringN(1,"?button=LOG IN")
WriteStringN(1,"[/LOGIN]")
CloseFile(1)
The reference says that "Without any 'Flags', the string is written in UTF8 format if the program is compiled in unicode mode, else it's written in ascii format."

Now adding the flags is no problem, and then everything works. But I am curious as to why UTF8 and not unicode is used as default.

Also, is UTF8 preferable (apart from file sizes) and will it support things like Turkish and Hebrew? (My app is likely to move into those markets within the year, whether I like it or not.)

If UTF8 has advantages, how do you create a UTF8 executable? Is there even such a thing?


lol - I am so glad PureBasic does most of the Unicode work as unicode has me confused.

Even within Pure every so often I encounter a curly. This is not a big curly, the fix is easy (#PB_Unicode flags) - but communicating with a webserver via API cost me some hair. :)

Posted: Sat May 27, 2006 12:56 pm
by freak
UTF8 was choosen as default mostly because it is byte-order independant. The 2 byte encodings
will produce a different file on Windows and MacOSX because of the processors type.
So a file saved with PB/Windows would not be readable on PB/Mac. UTF8 does not have
this problem, it is the same.

Another reason is that UTF8 can still be read and processed by non-unicode programs.
All non-ascii characters will look like rubbish to these programs, but if they open such
file, process it and write it back, it is still ok. (If an ascii program tries to process a unicode
file, its not going to look very good afterwards ;))

> This example code, without the #PB_Unicode flag on each WriteLineN line causes a later ReadData to misread the file.

I don't quite understand what you want to do with ReadData().
What exactly do you mean by misreading ?

> If UTF8 has advantages, how do you create a UTF8 executable? Is there even such a thing?

UTF8 is just an encoding for data exchange. Usually the programs work internally with a different
encoding which is easier to process.
So you cannot create a 'utf8 only' executable with PB.
UTF8 is able to encode any unicode character, so no language should be a problem there.

Posted: Sat May 27, 2006 1:56 pm
by Dare2
Hi Freak,

Thanks for the excellent explanation, it clarified a lot for me.

My bad with the ReadData. I now understand my error, I think.

What happens is that one program will, from time to time, update (appending to) a file using WriteStringN. Another program accesses the file using something like this:
ReadFile(#f,nam.s)
buffer.s = Lof(#f)
ReadData(#f,@buffer,Lof(#f))
CloseFile(#f)
Both programs are unicode flagged.

I now understand that what I was doing was expecting PureBasic ReadData to read my mind :) as it reads as binary (what it sees is what you get). So the string had garbage and it was all my fault. :) Good old GiGo.

I can #PB_Unicode the writes, or WriteData, and this cures it.

Freak, I am attempting to keep everything unicoded.

Apart from some stupidities like the above, all is going well except that I am really klurging when communicating with a server via API http. I do a lot of manipulating to get things happening. Re this post:

http://www.purebasic.fr/english/viewtopic.php?t=21975

Have you got any suggestions as to how to make this work more, um, elegantly. lexvictory suggests to use UTF8, which sounds fantastic but how does one make the strings UTF-able just for the API http calls?