Page 1 of 1
How should I handle localization files?
Posted: Thu Apr 28, 2005 4:15 am
by Xombie
My lost 3 question topics didn't get an answer so let's try topic #4!
I'd like to do some work on internationalizing my main project. What's the best and most convenient way to do this? For example, if my project contains 2 buttons - one #buttonTest and one #buttonClose, I would want to have a file or files or some method that does something like....
[English]
buttonTest="Test"
buttonClose="Close
[France]
buttonTest="Test"
buttonClose="Firme"
[Japanese]
buttonTest=ばか
buttonClose=これ
... etc... Of course, the file would have to be encoded as utf-8 unicode or whatever so each character would be a WORD not a BYTE value. I'm not sure how to best go about reading each line and extracting the correct text . Since the values wouldn't be stored in bytes I couldn't just peeks() and get them, right? After reading? Would I still even be able to correctly use ReadLine() since it's stored in WORD rather than BYTE values? Seems it would mistakenly identify a cr+lf character. Also, I could maybe use ReadData() but then wouldn't I have to peekw() and recombine every single character just to look for the right key?
So what's my best course here? My project will have at least a hundred controls and other text that I'll want translated into a couple different languages. How's the best way to store and retrieve the values - remembering that I'll be supporting Asian and other than ascii oriented languages? Something that could easily be changed around to fit other projects would be nice. So it's not just specific to my project.
Thanks! (hopefully)
Posted: Thu Apr 28, 2005 9:28 pm
by Derlidio
Yippe! You have a good question.
As far as I know, PB strings are non-unicode (ASCIIZ, a.k.a. LPSTR strings). This can turn your job pretty hard when trying to implement support for localization on eastern languages. The only way I see to accomplish this, supposing your application is using gadgets, would be to set all text on them through API calls instead of regular PB Calls. See, I've made no tests on this issue, so I'm guessing here (and supposing that it is possible to obtain a valid handle from a PB gadget in order to pass it to a Win API function).
You're right. Reading text from a unicode file with PB ReadString() function also doesn't work. Since each character in a unicode file is 2 bytes, whenever PB reaches a byte value of zero it will stop reading. I'm sure it can be worked arround by making a procedure that uses ReadData() instead, as you mentioned, but then your final result will be a a pointer to a unicode string (LPWSTR) that PB will not be able to handle by its own. So, that is what I meant when I told you'll have to set the text via API.
I know. This post doesn't help you much. I'm sorry. Just don't want to let you without, at least, one reply.
Best wishes...
- PJoe
Posted: Fri Apr 29, 2005 1:12 am
by Xombie
Derlidio,
Thanks for the response. I know I'll need to work with strings in memory and such. I have no problems with all that.
The only issue I'm wonder is - how best to *read* the information originally. Should I store country specific text in a type of ini file, manually do it as a DataSection or....? I'd like to know what the pros do. How to read and reference that information. I know how to do all the stuff with pokew() the word values in memory and all that.
And if I store the information in the text file, how do I create and read that file? I tried saving a simple test with notepad using Japanese character in the text (and saving as unicode) but when I opened the file with PB and did a ReadData() into memory and then tried peekw() it in a loop-step 2, it didn't seem to get the correct characters.
So... I dunno. But I'd really like to get this one figured out.
As a side question - how do Japanese or Chinese developers reference their text strings in development (in whatever programming language they use)? Like when we do 'HoldString.s = "Test!"', how do they set the value of a string? Do they do 'HoldString.s = "ばか"' in their editor or do they have to store the word values in memory as pass that to their function?
Just curious.
Posted: Fri Apr 29, 2005 5:05 am
by Derlidio
Howdy! Here I am, in the road again...
I've coded some samples on the unicode issue.
Indeed, there are 2 versions of the same code, both using API for reading a UNICODE INI file. One of them reads the values from the INI and converts to PB strings. The other reads the values and keeps them as UNICODE strings (I think this is what you'll need).
Take a look at the sources. I hope it can be useful and help you to figure out something
The sources are here:
http://www.chromatick.com.br/pbf/unicode.zip
Best wishes...
- PJoe
Posted: Fri Apr 29, 2005 5:30 am
by Xombie
Oh wow. I can't wait to try this out. Thank you very much! Did you code this just as an example or was it for a previous project?
Posted: Fri Apr 29, 2005 5:48 am
by Derlidio
Yippe!
I've coded it for you, yes.

But I'm used to those API functions from my VB projects, so it was not a hard task. Really, I don't know if they will work with Japanese characters the way you expect. I've worked with a Japanese version of Windows some years ago (despite of the fact that I can't read or talk in Japanese, anyway), and I must tell you, it was pretty funny
Using a INI file as a resource file may not be the best option, but it is easy for maintenence, thats for sure. You can have languages separated in several files, or put them all in separated sections of the same INI file. It will really deppend on how many info you have to store.
About your question on "how the programmers would use unicode inside their code", I think the answer for that is tight bind to the features of the programming language they choose. I'm not the right person to answer that, becouse for me (and for my expectations from PB), unicode is not a must (at least for now).
We must find someone out there that uses PB "and" a unicode version of Windows

(Now you made me curious too!)
Best wishes...
- PJoe
Posted: Fri Apr 29, 2005 1:39 pm
by Rescator
Using UTF-8 encoding for storing may be a alternative too.
You will not get binary 0 issues, UTF-8 only uses 2 or more characters
in the cases where a non Latin-1 character is used,
so it's a good solution for western/Roman based languages where only a few characters are non Latin-1 chars.
Microsoft store unicode in their own widechar format if I recall.
If you intend for large global support and expect several non Latin-1
or non Roman based text. (i.e. Japenese, Chinese, Korean, Hebrew etc)
Then using UTF-16 (always 2 bytes or more per character) may be more suitable.