chr() and unicode

Just starting out? Need help? Post your questions and find answers here.
User avatar
pdwyer
Addict
Addict
Posts: 2813
Joined: Tue May 08, 2007 1:27 pm
Location: Chiba, Japan

Post by pdwyer »

Revisiting this, it looks like a bug!

This is not consistant

Code: Select all


OpenFile(0,"F:\Programming\PureBasicCode\SQLite\uni kanji.txt") ; contains the unicode of "漢字" ("Kanji")
Text.s = ReadString(0,#PB_Unicode)

Char1 = Asc(Mid(text,1,1))
Char2 = Asc(Mid(text,2,1))

MessageRequester("", text)                              ; Displays "漢字" okay
MessageRequester("", Str(char1) + " " + Str(char2))     ; Displays "65279 28450" okay
MessageRequester("", Chr(char1))                        ; Displays nothing
MessageRequester("", Chr(char2))                        ; Displays "漢" okay
MessageRequester("", Chr(char1) + Chr(char2))           ; Displays only "漢"

If you change the top text to be

Code: Select all

Text = "AB"     ;Two characters but english
then everything works as expected
Paul Dwyer

“In nature, it’s not the strongest nor the most intelligent who survives. It’s the most adaptable to change” - Charles Darwin
“If you can't explain it to a six-year old you really don't understand it yourself.” - Albert Einstein
KingNips
New User
New User
Posts: 7
Joined: Thu Oct 11, 2007 1:14 pm
Location: Marrickville

Post by KingNips »

pdwyer wrote:Revisiting this, it looks like a bug!

This is not consistant

Code: Select all


OpenFile(0,"F:\Programming\PureBasicCode\SQLite\uni kanji.txt") ; contains the unicode of "漢字" ("Kanji")
Text.s = ReadString(0,#PB_Unicode)

Char1 = Asc(Mid(text,1,1))
Char2 = Asc(Mid(text,2,1))

MessageRequester("", text)                              ; Displays "漢字" okay
MessageRequester("", Str(char1) + " " + Str(char2))     ; Displays "65279 28450" okay
MessageRequester("", Chr(char1))                        ; Displays nothing
MessageRequester("", Chr(char2))                        ; Displays "漢" okay
MessageRequester("", Chr(char1) + Chr(char2))           ; Displays only "漢"

If you change the top text to be

Code: Select all

Text = "AB"     ;Two characters but english
then everything works as expected
Maybe there's a null character in front of the original string? Looks like your missing the Vanilla Ice standing on one leg character (字).
User avatar
pdwyer
Addict
Addict
Posts: 2813
Joined: Tue May 08, 2007 1:27 pm
Location: Chiba, Japan

Post by pdwyer »

ARRRRHHH

Its that damn byte order mark!

Forgot about that grrrrr
Paul Dwyer

“In nature, it’s not the strongest nor the most intelligent who survives. It’s the most adaptable to change” - Charles Darwin
“If you can't explain it to a six-year old you really don't understand it yourself.” - Albert Einstein
KingNips
New User
New User
Posts: 7
Joined: Thu Oct 11, 2007 1:14 pm
Location: Marrickville

Post by KingNips »

pdwyer wrote:ARRRRHHH

Its that damn byte order mark!

Forgot about that grrrrr
Textbook.
Mistrel
Addict
Addict
Posts: 3415
Joined: Sat Jun 30, 2007 8:04 pm

Post by Mistrel »

It's not a bug. I had the exact same problem today. :roll:

http://www.purebasic.fr/english/viewtopic.php?t=29216
User avatar
pdwyer
Addict
Addict
Posts: 2813
Joined: Tue May 08, 2007 1:27 pm
Location: Chiba, Japan

Post by pdwyer »

"Magic!"

What's with this then?

Code: Select all


MessageRequester("", Chr($6F22))        ;No Good
MessageRequester("", Chr(28450))        ;No Good
MessageRequester("", Chr(Int(28450)))   ;OK

;all same in debugger
Debug 28450             
Debug $6F22             
Debug Int(28450)

Does chr() have some type limitations for the int?
Paul Dwyer

“In nature, it’s not the strongest nor the most intelligent who survives. It’s the most adaptable to change” - Charles Darwin
“If you can't explain it to a six-year old you really don't understand it yourself.” - Albert Einstein
KingNips
New User
New User
Posts: 7
Joined: Thu Oct 11, 2007 1:14 pm
Location: Marrickville

Post by KingNips »

pdwyer wrote:"Magic!"

What's with this then?

Code: Select all


MessageRequester("", Chr($6F22))        ;No Good
MessageRequester("", Chr(28450))        ;No Good
MessageRequester("", Chr(Int(28450)))   ;OK

;all same in debugger
Debug 28450             
Debug $6F22             
Debug Int(28450)

Does chr() have some type limitations for the int?
Looks like it isn't casting to an int on the fly. It needs an actual int, not a string or whatever.

Ah BASIC. Everything's a string... except when its not.
User avatar
pdwyer
Addict
Addict
Posts: 2813
Joined: Tue May 08, 2007 1:27 pm
Location: Chiba, Japan

Post by pdwyer »

Things have changed just a tad since Qbasic dood, Perhaps you spent too long with Pick basic.

Actually, if you aren't explicit "everything's a long". But in this case perhaps "Everything's a bug" :?
Paul Dwyer

“In nature, it’s not the strongest nor the most intelligent who survives. It’s the most adaptable to change” - Charles Darwin
“If you can't explain it to a six-year old you really don't understand it yourself.” - Albert Einstein
KingNips
New User
New User
Posts: 7
Joined: Thu Oct 11, 2007 1:14 pm
Location: Marrickville

Post by KingNips »

I'm writing my own AAC codec for WMP.

My head just exploded...
KingNips
New User
New User
Posts: 7
Joined: Thu Oct 11, 2007 1:14 pm
Location: Marrickville

Post by KingNips »

pdwyer wrote: Perhaps you spent too long with Pick basic.
ooooo yeah. <shudder>
User avatar
pdwyer
Addict
Addict
Posts: 2813
Joined: Tue May 08, 2007 1:27 pm
Location: Chiba, Japan

Post by pdwyer »

KingNips wrote:I'm writing my own AAC codec for WMP.

My head just exploded...
In what language? I have this naging suspicion that you're just a tourist here. 8)
Paul Dwyer

“In nature, it’s not the strongest nor the most intelligent who survives. It’s the most adaptable to change” - Charles Darwin
“If you can't explain it to a six-year old you really don't understand it yourself.” - Albert Einstein
User avatar
pdwyer
Addict
Addict
Posts: 2813
Joined: Tue May 08, 2007 1:27 pm
Location: Chiba, Japan

Post by pdwyer »

If you're bored you should take a look here
http://projecteuler.net/

<throws down gauntlet>
I haven't been back in a couple of months, but I'm a 13% genius so far.

:twisted:
Paul Dwyer

“In nature, it’s not the strongest nor the most intelligent who survives. It’s the most adaptable to change” - Charles Darwin
“If you can't explain it to a six-year old you really don't understand it yourself.” - Albert Einstein
Fred
Administrator
Administrator
Posts: 18162
Joined: Fri May 17, 2002 4:39 pm
Location: France
Contact:

Post by Fred »

Actually, it is a bug (now fixed). Better use UTF8 source files if you do unicode programs (and you wouldn't had this issue).
mskuma
Enthusiast
Enthusiast
Posts: 573
Joined: Sat Dec 03, 2005 1:31 am
Location: Australia

Post by mskuma »

For things like that I always use (compiled in unicode):

Code: Select all

kanji.c = 28450 ; or $6F22
kanjiStr.s = PeekS(@kanji,1)

Debug kanjiStr
MessageRequester ("test", kanjiStr)
I never used chr() because historically that was basically meant for ASCII only.

I personally almost exclusively use UTF8 so I can enter Japanese chars directly into the source avoiding a lot of complications.
User avatar
pdwyer
Addict
Addict
Posts: 2813
Joined: Tue May 08, 2007 1:27 pm
Location: Chiba, Japan

Post by pdwyer »

Fred wrote: Better use UTF8 source files if you do unicode programs (and you wouldn't had this issue).
I'm not sure I understand this. My IDE is set to use utf8 but in the example of the messagerequester() there's not non ascii in the IDE. I passed a unicode value to a chr() and unless it was of a certain type it wouldn't display. If I used UTF8 how would I have passed it to the messagerequester unless I converted it again?

<RANT>
Thankfully, my intl requirements are (at this stage anyway) just japanese so I can avoid unicode. My system handles codepage 932 so I can type kanji into the IDE, it displays fine, the apps are fine and it even displays in the debugger and I don't need to compile unicode! I need to just be careful about string lengths and chr() etc.

I was wondering about these chr() functions with UTF16 but generally they look fine in unicode mode, but UTF8... ??? I suppose the advantage is that for most apps that don't need intl support utf8 just works like ascii but uther than it's ability to hold intl data it's a real pain to work with, more painful than utf16 and non unicode code pages.

</RANT>

@mskuma: How is your OS set up? for me, Japanese is fine in the IDE but typing japanese doesn't go in as UTF8, everything is cp932. My system has multi language installed so my wife's profile is all japanese, right down to the start button and menu's and my profile is english but has support for japanese (more than just the asia fonts installed) but I see no where in regional settings or what not to use UTF8 rather than codepages. I'm not familiar with any way of setting the OS to UTF8 either except for IE.

Image
Paul Dwyer

“In nature, it’s not the strongest nor the most intelligent who survives. It’s the most adaptable to change” - Charles Darwin
“If you can't explain it to a six-year old you really don't understand it yourself.” - Albert Einstein
Post Reply