chr() and unicode

Just starting out? Need help? Post your questions and find answers here.
mskuma
Enthusiast
Enthusiast
Posts: 573
Joined: Sat Dec 03, 2005 1:31 am
Location: Australia

Post by mskuma »

Hi Paul, I think the code page issue is a non-issue.. try saving a PB file as UTF-8 and then view the pb file in any UTF-8 compatible editor (ideally one that reports the encoding in the status area etc), and note what encoding it thinks the file is in.. It should report UTF-8. I've tried editplus & sakura editor, and they both report UTF-8. As for my setup, it's just my regular English OS setup for Japanese regional options (defaults for code page) and non-unicode program to display as Japanese (just for the odd non-unicode Japanese app). BTW the code pages includes UTF-8 ticked by default (but it's greyed out, and I think it's not really significant anyway - 932 is ticked however). I've never twiddled with these - never had to, and never had any issues.
Mistrel
Addict
Addict
Posts: 3415
Joined: Sat Jun 30, 2007 8:04 pm

Post by Mistrel »

pdwyer, I'm curious. Are you using the internal debugger to output Japanese? I can't tell by your screenshot.

Freak says that it's only possible with the standalone debugger.
User avatar
pdwyer
Addict
Addict
Posts: 2813
Joined: Tue May 08, 2007 1:27 pm
Location: Chiba, Japan

Post by pdwyer »

@Mistrel, I saw freaks comments, I think he means that the internal debugger doesn't support unicode (I didn't manage to get it to support unicode either). The screenshot is the internal debugger though and that's not a unicode app. In regional settings my PC will treat non unicode as codepage 932 (shift jis)

@mskuma, Seems we both have the same goal but we've picked up a different habbit of how to get there as our preference :) I was a big unicode fan when I used VB about 7-8 years ago then when I used powerbasic it had no support so I'd just handle the API as needed (netuseradd() etc ) then did all the rest in local code page. win2k's codepage 932 support was not too good though compared to XP. I'll have a play with the editor to see how it goes with utf8. Generally I find it easier not to bother with unicode unless I need to support more than one language on top of an alphabet based one... not very often thankfully.

Personally, it seems that if utf8 was never invented, the world might have been a bit closer to getting to a clean utf16 standard that would get rid of all the code pages.
Paul Dwyer

“In nature, it’s not the strongest nor the most intelligent who survives. It’s the most adaptable to change” - Charles Darwin
“If you can't explain it to a six-year old you really don't understand it yourself.” - Albert Einstein
Fred
Administrator
Administrator
Posts: 16687
Joined: Fri May 17, 2002 4:39 pm
Location: France
Contact:

Post by Fred »

Finally, i reverted the "fix" for the Chr() as it brought some major regressions. To use it correctly in unicode mode, ensure your source file is in UTF8 format (IDE/Compiler/Options/Source Format).
User avatar
pdwyer
Addict
Addict
Posts: 2813
Joined: Tue May 08, 2007 1:27 pm
Location: Chiba, Japan

Post by pdwyer »

Fred,

Could you elaborate on this please. How will it work in the final release?
Paul Dwyer

“In nature, it’s not the strongest nor the most intelligent who survives. It’s the most adaptable to change” - Charles Darwin
“If you can't explain it to a six-year old you really don't understand it yourself.” - Albert Einstein
Trond
Always Here
Always Here
Posts: 7446
Joined: Mon Sep 22, 2003 6:45 pm
Location: Norway

Post by Trond »

pdwyer wrote:Fred,

Could you elaborate on this please. How will it work in the final release?
Just forget everything about codepages, use UTF-8 for all files and make a unicode executable. That should make it work.
User avatar
pdwyer
Addict
Addict
Posts: 2813
Joined: Tue May 08, 2007 1:27 pm
Location: Chiba, Japan

Post by pdwyer »

That's easy to say but:

- My IDE gets written to in codepage and not UTF8 (even though it's set to UTF 8 ) so I can't manually put anything in in that format unless I convert from cp932 to utf16 (and then back to utf8?)
- If I do insert from utf8 files I have to convert everything on the fly between utf8 and utf16 in order to use string functions and display
- I'm not aware of a win32api utf8 version. :roll:

The easiest way to go is use the same thing throughout, CP or UTF16. UTF8 is probably the most work, converting and inconvenience. Since the IDE doesn't support UTF16 this makes the decision even easier....

Avoid unicode like the plague unless it's absolutely necessary and no workarounds are available (and then go UTF16).
Paul Dwyer

“In nature, it’s not the strongest nor the most intelligent who survives. It’s the most adaptable to change” - Charles Darwin
“If you can't explain it to a six-year old you really don't understand it yourself.” - Albert Einstein
mskuma
Enthusiast
Enthusiast
Posts: 573
Joined: Sat Dec 03, 2005 1:31 am
Location: Australia

Post by mskuma »

pdwyer wrote:My IDE gets written to in codepage and not UTF8
I must be missing something or this whole thing is a non-issue. I develop for Japanese daily using PB in the same environment as you I think, and never encounter any issues.. are you talking about the PB IDE? Set it to UTF-8 & compile for unicode and forget.. Japanese chars can be typed in directly in the IDE & assigned to strings (compile for unicode of course). It just works.. no conversion issues (particularly when you are creating/dealing with characters yourself). There is absolutely no reason to avoid unicode like the plague.. unicode plus PB's implementation is very friendly for any non-English programming, and is one of the prime reasons why I personally use PB regularly.
Last edited by mskuma on Mon Oct 22, 2007 2:14 pm, edited 1 time in total.
Fred
Administrator
Administrator
Posts: 16687
Joined: Fri May 17, 2002 4:39 pm
Location: France
Contact:

Post by Fred »

Utf8 is the way to go, just change your IDE :)
User avatar
pdwyer
Addict
Addict
Posts: 2813
Joined: Tue May 08, 2007 1:27 pm
Location: Chiba, Japan

Post by pdwyer »

and not use the PB one you mean?

I like the built in IDE !! (something I never said about powerbasic, I always went 3rd party with them)

Actually, I'm happy using code pages till you guys recompile the IDE in unicode mode :D

I think it adds extra complexity building unicode apps in a non unicode IDE, I can't type japanese straight in.

As is, I can put a japanese interface app together as easy as an english one so I'm not really annoyed. I just thought Unicode might be easier (since mskuma is a diehard unicode apologist! he talked fast to me and I lost my way ;) )
Paul Dwyer

“In nature, it’s not the strongest nor the most intelligent who survives. It’s the most adaptable to change” - Charles Darwin
“If you can't explain it to a six-year old you really don't understand it yourself.” - Albert Einstein
LuCiFeR[SD]
666
666
Posts: 1033
Joined: Mon Sep 01, 2003 2:33 pm

Post by LuCiFeR[SD] »

Pure basic IDE: Preferences/Editor/Defaults. Sourcefile text Encoding/Utf-8

Is what Fred was trying to say... at least I think so :P (or have I totally misunderstood the problem?)
mskuma
Enthusiast
Enthusiast
Posts: 573
Joined: Sat Dec 03, 2005 1:31 am
Location: Australia

Post by mskuma »

pdwyer wrote:I can't type japanese straight in.
If you set the Sourcefile Text Encoding to UTF8 as Fred and LuCiFeR[SD] say, you can type Japanese (or any other non-English) language directly into the editor. It may be easiest to set this via 'Compiler Options' button [next to the compile/run button], see 2nd last option on this options form.

Generally if you compile as unicode, you'll see these chars appear in the application.
User avatar
pdwyer
Addict
Addict
Posts: 2813
Joined: Tue May 08, 2007 1:27 pm
Location: Chiba, Japan

Post by pdwyer »

I have set it on the compiler options button already, I think it's set by default. (or atleast I don't remember making that change).

I'll try the "Preferences/Editor/Defaults" location too as I didn't change that one, not sure if there's a difference.

I'll double check when I get home but I'm fairly sure if not using UTF8.

If in ascii mode compile I can type cp932 into the IDE and it displays fine, debugs fine and GUI code works (menu's, gadgets etc) so no problem in that sense.

In unicode mode, I can type cp932 into the IDE and it looks fine but it won't display compiled anywhere, messagerequester() gadgets etc all show garbage, conversion works poorly if cp92 is compiled straight in to unicode app.
Paul Dwyer

“In nature, it’s not the strongest nor the most intelligent who survives. It’s the most adaptable to change” - Charles Darwin
“If you can't explain it to a six-year old you really don't understand it yourself.” - Albert Einstein
User avatar
pdwyer
Addict
Addict
Posts: 2813
Joined: Tue May 08, 2007 1:27 pm
Location: Chiba, Japan

Post by pdwyer »

OK, That's pretty special! :shock:

I'm not sure if it's a bug in the IDE or if it's two different settings.

The compile options was always set to UTF8 but the preferences was set to plain text. When I changed the preferences to UTF8 then unicode works for Japanese which is great.

Also, the internal debugger works for japanese in unicode now which is great but Mistrel and Freak said it wasn't supposed to!

Now I'll have to try a whole lot of things all over again, I thought the compiler options is what people were talking about, wonder what that is then... :?

Image
Paul Dwyer

“In nature, it’s not the strongest nor the most intelligent who survives. It’s the most adaptable to change” - Charles Darwin
“If you can't explain it to a six-year old you really don't understand it yourself.” - Albert Einstein
mskuma
Enthusiast
Enthusiast
Posts: 573
Joined: Sat Dec 03, 2005 1:31 am
Location: Australia

Post by mskuma »

pdwyer wrote:Also, the internal debugger works for japanese in unicode now which is great but Mistrel and Freak said it wasn't supposed to!
I've read before that doesn't, however for the internal debugger window, the regional options setting 'display non-unicode characters as ..' (Japanese) is what's responsible for making it display properly there.
wonder what that is then... :?
I guess it's simply reporting a 2-byte length for a unicode char?
Post Reply