WriteStringN, ReadData, unicode

Just starting out? Need help? Post your questions and find answers here.
Dare2
Moderator
Moderator
Posts: 3321
Joined: Sat Dec 27, 2003 3:55 am
Location: Great Southern Land

WriteStringN, ReadData, unicode

Post by Dare2 »

I need even more educating with Unicode.

(Note: Currently all my programs are compiled with the "Create unicode executable" checked.)


This example code, without the #PB_Unicode flag on each WriteLineN line causes a later ReadData to misread the file. (ReadData and WriteData appear to stick with whatever mode the "Create unicode executable" flag defines).

Code: Select all

CreateFile(1,"C:\100\pb400\APPLICATIONS\IVMGlobal\data\lang_English.txt")
WriteStringN(1,"[LOGIN]")
WriteStringN(1,"?username=YOUR USER NAME")
WriteStringN(1,"?userpass=YOUR PASSWORD")
WriteStringN(1,"?button=LOG IN")
WriteStringN(1,"[/LOGIN]")
CloseFile(1)
The reference says that "Without any 'Flags', the string is written in UTF8 format if the program is compiled in unicode mode, else it's written in ascii format."

Now adding the flags is no problem, and then everything works. But I am curious as to why UTF8 and not unicode is used as default.

Also, is UTF8 preferable (apart from file sizes) and will it support things like Turkish and Hebrew? (My app is likely to move into those markets within the year, whether I like it or not.)

If UTF8 has advantages, how do you create a UTF8 executable? Is there even such a thing?


lol - I am so glad PureBasic does most of the Unicode work as unicode has me confused.

Even within Pure every so often I encounter a curly. This is not a big curly, the fix is easy (#PB_Unicode flags) - but communicating with a webserver via API cost me some hair. :)
@}--`--,-- A rose by any other name ..
freak
PureBasic Team
PureBasic Team
Posts: 5940
Joined: Fri Apr 25, 2003 5:21 pm
Location: Germany

Post by freak »

UTF8 was choosen as default mostly because it is byte-order independant. The 2 byte encodings
will produce a different file on Windows and MacOSX because of the processors type.
So a file saved with PB/Windows would not be readable on PB/Mac. UTF8 does not have
this problem, it is the same.

Another reason is that UTF8 can still be read and processed by non-unicode programs.
All non-ascii characters will look like rubbish to these programs, but if they open such
file, process it and write it back, it is still ok. (If an ascii program tries to process a unicode
file, its not going to look very good afterwards ;))

> This example code, without the #PB_Unicode flag on each WriteLineN line causes a later ReadData to misread the file.

I don't quite understand what you want to do with ReadData().
What exactly do you mean by misreading ?

> If UTF8 has advantages, how do you create a UTF8 executable? Is there even such a thing?

UTF8 is just an encoding for data exchange. Usually the programs work internally with a different
encoding which is easier to process.
So you cannot create a 'utf8 only' executable with PB.
UTF8 is able to encode any unicode character, so no language should be a problem there.
quidquid Latine dictum sit altum videtur
Dare2
Moderator
Moderator
Posts: 3321
Joined: Sat Dec 27, 2003 3:55 am
Location: Great Southern Land

Post by Dare2 »

Hi Freak,

Thanks for the excellent explanation, it clarified a lot for me.

My bad with the ReadData. I now understand my error, I think.

What happens is that one program will, from time to time, update (appending to) a file using WriteStringN. Another program accesses the file using something like this:
ReadFile(#f,nam.s)
buffer.s = Lof(#f)
ReadData(#f,@buffer,Lof(#f))
CloseFile(#f)
Both programs are unicode flagged.

I now understand that what I was doing was expecting PureBasic ReadData to read my mind :) as it reads as binary (what it sees is what you get). So the string had garbage and it was all my fault. :) Good old GiGo.

I can #PB_Unicode the writes, or WriteData, and this cures it.

Freak, I am attempting to keep everything unicoded.

Apart from some stupidities like the above, all is going well except that I am really klurging when communicating with a server via API http. I do a lot of manipulating to get things happening. Re this post:

http://www.purebasic.fr/english/viewtopic.php?t=21975

Have you got any suggestions as to how to make this work more, um, elegantly. lexvictory suggests to use UTF8, which sounds fantastic but how does one make the strings UTF-able just for the API http calls?
@}--`--,-- A rose by any other name ..
Post Reply