Page 1 of 1
Wierd bytes $3F in the middle of chars 0-255
Posted: Tue Aug 22, 2017 2:38 pm
by oakvalley
Hi there,
Try this code in PB5.60:
Code: Select all
For t=0 To 255
string$=string$+RSet(Str(t),3,"0")+"|"
Next t
CreateFile(2,"d:\wierd.txt",#PB_Ascii)
For t=0 To 255
WriteString(2,Chr(Val(Left(StringField(string$,t+1,"|"),3))),#PB_Ascii)
Next t
CloseFile(2)
Take a look in a Hex Editor. It counts as expected from $01-$7F, then suddenly $80 becomes $3F, then $81 is in place where I would expect, then lots of $3F and other wired bytes again, until suddenly when it reaches $A0 it counts fine until $FF
How come?
I already said #PB_Ascii for both ReadFile and WriteString. Magic ASCII range from 128-160 all of a sudden?
Re: Wierd bytes $3F in the middle of chars 0-255
Posted: Tue Aug 22, 2017 3:38 pm
by kenmo
ASCII and Unicode definitions do not match in the $80-$9F (128-159) range.
The Unicode characters in this range are:
https://en.wikipedia.org/wiki/Latin-1_S ... ode_block)
The "ASCII" characters in this range are, assuming you're on Windows:
https://en.wikipedia.org/wiki/Windows-1252
Strings in PB 5.60 are always Unicode.
#PB_Ascii just tells it to convert to ASCII, as best as possible.
But the Unicode characters $80-$9F don't exist in ASCII, so they become $3F (question mark).
Example: Why doesn't Unicode $80 (Padding Character) just convert to ASCII $80?
Because ASCII $80 is the Euro sign... which actually pairs with Unicode $20AC.
So unmappable Unicode characters (including most > 255) become '?' instead of becoming other different characters.
I believe PB is using the OS's built-in conversion, so if you disagree with the behavior you probably have to write your own procedure.
Hope that makes sense.
Re: Wierd bytes $3F in the middle of chars 0-255
Posted: Tue Aug 22, 2017 4:40 pm
by kenmo
Same results using Windows API:
Code: Select all
DataSection
UnicodeBytes:
Data.u $0041, $0080, $0081 ; Unicode chars $0041, $0080, $0081
AsciiBytes:
Data.a 0, 0, 0 ; ASCII byte buffer
EndDataSection
Debug Hex(PeekU(?UnicodeBytes + 0))
Debug Hex(PeekU(?UnicodeBytes + 2))
Debug Hex(PeekU(?UnicodeBytes + 4))
Debug ""
; Convert to ASCII using Windows API
WideCharToMultiByte_(#CP_ACP, #Null, ?UnicodeBytes, 3, ?AsciiBytes, 3, #Null, #Null)
Debug Hex(PeekA(?AsciiBytes + 0))
Debug Hex(PeekA(?AsciiBytes + 1))
Debug Hex(PeekA(?AsciiBytes + 2))
Re: Wierd bytes $3F in the middle of chars 0-255
Posted: Tue Aug 22, 2017 5:35 pm
by firace
@kenmo Good info, thanks!
And if you just want to write raw values 0 to 255, this should work as expected:
Code: Select all
CreateFile(0,"temp.DAT")
For t=0 To 255
WriteAsciiCharacter(0, t)
Next t
CloseFile(0)
Re: Wierd bytes $3F in the middle of chars 0-255
Posted: Tue Aug 22, 2017 6:40 pm
by oakvalley
@kenmo Thanks for the clarification!
So, if I understand this correctly. I thought ASCII is 0-255. And UNICODE was >255 to whatever millions of chars they need to create later.
How they ended up eating a HOLE into standard ASCII 0-255 and basically say "lets do a best fit here instead" is completely ludicrus, the guys
who invented UNICODE sure messed up as far as I can see.
Why couldn't they just let 0-255 be as they always was and continue from there with UNICODE (simply ADD it to the already established table).
Oh well, Ill have to make my own conversion for those unicode bytes and write it out myself in whatever code/memory/file situation I run into as the character I need to have

Re: Wierd bytes $3F in the middle of chars 0-255
Posted: Tue Aug 22, 2017 6:59 pm
by Josh
Ascii includes only 0 - 127 and not 0 - 255
Using Ascii, the area 128 - 255 is different for each language or application and depends on the used codepage.
The problems you have, are made by yourself.
See your other topic.
Re: Wierd bytes $3F in the middle of chars 0-255
Posted: Tue Aug 22, 2017 7:15 pm
by kenmo
"There are over a hundred encodings and above code point 127, all bets are off."
This article is 14 years old (!) but still a good read about ASCII, codepages, and Unicode.
https://www.joelonsoftware.com/2003/10/ ... o-excuses/
True "ASCII" is just 0-127. Characters 128-255 were vendor-specific and unreliable, UNTIL Unicode came along and created a global standard.
Re: Wierd bytes $3F in the middle of chars 0-255
Posted: Tue Aug 22, 2017 7:46 pm
by VB6_to_PBx
firace wrote:@kenmo Good info, thanks!
And if you just want to write raw values 0 to 255, this should work as expected:
Code: Select all
CreateFile(0,"temp.DAT")
For t=0 To 255
WriteAsciiCharacter(0, t)
Next t
CloseFile(0)
i tweaked your Code a little to make it display separate Characters down a page with corresponding number
Code: Select all
;- WriteAsciiCharacter__v1.pb
;-
;- Link : http://www.purebasic.fr/english/viewtopic.php?f=13&t=69023
;- Post Subject/Date :
;- Compiler : PB 5.31
;-
;-< Start Program >------------------------------------------------------------
;
;
CreateFile(0,"C:\PureBASIC\ASCII__and__UniCode__and__UTF_8\WriteAsciiCharacter.txt") ;<-- type your Drive/Folder/Filename here
For t=0 To 255
WriteAsciiCharacter(0, t)
; WriteString(0,Space(3) + Str(t) + #CRLF$)
WriteString(0," = " + Str(t) + #CRLF$)
Next t
CloseFile(0)
Re: Wierd bytes $3F in the middle of chars 0-255
Posted: Thu Aug 24, 2017 11:36 pm
by oakvalley
Yeah, you are all right. ASCII is 0-127, 7-bit. Its just that my memories from the good old Commodore 64 and Amiga steered me into the belief of ASCII 0-255
Anyway, I solved my problems by using *ascii=Ascii() and then PokeA(*ascii) in PureBasic to get the values I was seeking. I was working with some databases that contained filenames originating from Amiga and just got surprised when there was a "hole" in the daily routine of ASCII chars.