Hi
Yes, utf8/utf8mb4 got me as well...hated MySQL for that.
I have been dealing with Chinese characters for quite some time now,
both in coding and for learning the language, so here is my experience:
Most languages don't have support for proper Unicode, mostly due to speed reasons
but also because Unicode is just extremely complex. You may have heard of ICU (
http://site.icu-project.org/),
which is a huge library for dealing with these issues.
Languages I used (Java, C#, JavaScript, PHP) all have the same issues, though the standard libraries are better equipped than PB's.
You have to ask yourself what exactly it is that you need to do in your application.
If you just need to translate messages in your applications to Chinese, you won't run into too many problems.
So if you ask your translator to translate lines in a UTF-8 text file which are then handed to functions like
SetGadgetText(), it should work "as is". The problems start when you need to parse or modify user input.
The way I do it is I treat strings as if they were raw memory buffers, preferably in UTF-8. UTF-8 has the advantage that several
string operations just work (except for normalization issues). For example, concatenation, comparing for equality,
and searching within a string will work and so will functions such as Trim() (if using UTF-8).
-> This should also work with PB's UCS-2, so as long as you stick to these operations you should be fine.
If you need to do operations on the actual text, such as UCase(), Mid(), or if string
character length is important,
you will inevitably run into trouble. Not only because of Unicode but also because of cultural differences (e.g. UCase("i")
may work for English/German etc. but won't work for Turkish).
So to give you better advice: what exactly is it your application does and what exactly are the translations used for?