Changing Stringformat Ascii, Unicode, Utf8

Just starting out? Need help? Post your questions and find answers here.
User avatar
Josh
Addict
Addict
Posts: 1183
Joined: Sat Feb 13, 2010 3:45 pm

Changing Stringformat Ascii, Unicode, Utf8

Post by Josh »

With the string formats I have my problems again and again. The following code runs as I want when I compile my program into Unicode:

Code: Select all

Define Buf$ {5}

PokeS (@Buf$, Chr(8201), -1, #PB_UTF8)

For i = 0 To 3
  Debug PeekA (@Buf$ + i)
Next
Output:
226
128
137
0


Unfortunately, I fail if I compile my code to Ascii. Does anyone have any idea what my code should look like in this case, if the same result as above should come out.
thxs
sorry for my bad english
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3870
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: Changing Stringformat Ascii, Unicode, Utf8

Post by wilbert »

In this case it’s not a good idea to compile for Ascii.
You will have to create your own utf8 encoder.
You can look at this thread for some inspiration
http://www.purebasic.fr/english/viewtop ... 12&t=60273
Windows (x64)
Raspberry Pi OS (Arm64)
User avatar
mk-soft
Always Here
Always Here
Posts: 5386
Joined: Fri May 12, 2006 6:51 pm
Location: Germany

Re: Changing Stringformat Ascii, Unicode, Utf8

Post by mk-soft »

New functions...
*mem = ASCII(text) and *mem = UTF8(text)

Code: Select all

*mem = UTF8("Hello World öäüú")
len = MemorySize(*mem) - 1
Debug len
Debug PeekS(*mem, -1, #PB_UTF8)
ShowMemoryViewer(*mem, len)
My Projects ThreadToGUI / OOP-BaseClass / EventDesigner V3
PB v3.30 / v5.75 - OS Mac Mini OSX 10.xx - VM Window Pro / Linux Ubuntu
Downloads on my Webspace / OneDrive
User avatar
Josh
Addict
Addict
Posts: 1183
Joined: Sat Feb 13, 2010 3:45 pm

Re: Changing Stringformat Ascii, Unicode, Utf8

Post by Josh »

wilbert wrote:In this case it’s not a good idea to compile for Ascii.
If I compile in Unicode, then I have much more problems at other places, e.g. with the pipe to another app.
wilbert wrote:You will have to create your own utf8 encoder.
You can look at this thread for some inspiration
http://www.purebasic.fr/english/viewtop ... 12&t=60273
From the examples in this link, I'm still not smart, because here is read from a file and I want to generate a Utf8 string. But I'll take a closer look, maybe I'll think of something else.
mk-soft wrote:New functions...
*mem = ASCII(text) and *mem = UTF8(text)
Thanks for the example, but here the cat bites itself in the tail. The Pb versions where I can compile in Ascii mode have no ASCII () and no UTF8 (). Converting to an UTF8-String wasn't the problem, if I compiled my code to an Unicode-Exe (See my snipped).
sorry for my bad english
User avatar
kenmo
Addict
Addict
Posts: 1967
Joined: Tue Dec 23, 2003 3:54 am

Re: Changing Stringformat Ascii, Unicode, Utf8

Post by kenmo »

1. I think you're overlooking something...
Your example code will never work in ASCII mode, because of this part:

Code: Select all

Chr(8201)
How can you store characters outside the ASCII range in an ASCII string??? I think PB will produce "?" when this runs in ASCII mode.



2.
If I compile in Unicode, then I have much more problems at other places, e.g. with the pipe to another app.
If you want to support Unicode text, I really really really recommend you use Unicode mode and work on other problems like piping, instead of using ASCII mode...
Unicode characters and UTF-8 text are very difficult to handle in ASCII mode, because strings will only handle characters 0-255 !!



3. If you need them...

Code: Select all

Procedure.i UTF8(String$)
  Protected Bytes.i = 1 + StringByteLength(String$, #PB_UTF8)
  Protected *Mem = AllocateMemory(Bytes)
  If (*Mem)
    PokeS(*Mem, String$, -1, #PB_UTF8)
  EndIf
  ProcedureReturn (*Mem)
EndProcedure

Procedure.i Ascii(String$)
  Protected Bytes.i = 1 + StringByteLength(String$, #PB_Ascii)
  Protected *Mem = AllocateMemory(Bytes)
  If (*Mem)
    PokeS(*Mem, String$, -1, #PB_Ascii)
  EndIf
  ProcedureReturn (*Mem)
EndProcedure
Fred
Administrator
Administrator
Posts: 16664
Joined: Fri May 17, 2002 4:39 pm
Location: France
Contact:

Re: Changing Stringformat Ascii, Unicode, Utf8

Post by Fred »

There is also codepage issue in ascii mode: each ascii char from 128 to 255 are not the same depending of the codepage used.
User avatar
skywalk
Addict
Addict
Posts: 3994
Joined: Wed Dec 23, 2009 10:14 pm
Location: Boston, MA

Re: Changing Stringformat Ascii, Unicode, Utf8

Post by skywalk »

Best to compile latest PB version which is Unicode only and manage Ascii accordingly.
Just be aware null bytes sometimes counted and sometimes not :idea:
I dropped Ascii entirely for UTF-8 with no big issues so far.
The nice thing about standards is there are so many to choose from. ~ Andrew Tanenbaum
User avatar
Josh
Addict
Addict
Posts: 1183
Joined: Sat Feb 13, 2010 3:45 pm

Re: Changing Stringformat Ascii, Unicode, Utf8

Post by Josh »

I agree with you that it is best to compile programs with Unicode. But not in every case.

In my case I use Print (), PrintN (), ReadConsoleData (), WriteConsoleData (), all of them do not support Utf8/Ascii flags. I also write and read text files in Utf8 format. When I compile my program in Ascii mode, everything is simple and clear. However, if I compile in Unicode mode, it will format back and forth and the simplest things will get really complicated.


Now to my real problem from my first posting:

In the example I want nothing else then to write the Utf8 code of the character 8201 (ThinSpace) into the memory.
Like you see in my example, this run well in an unicode-exe, but it should run in an ascii-exe.
sorry for my bad english
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3870
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: Changing Stringformat Ascii, Unicode, Utf8

Post by wilbert »

Josh wrote:In the example I want nothing else then to write the Utf8 code of the character 8201 (ThinSpace) into the memory.
You can try

Code: Select all

Define Buf$ {5}

PokeS (@Buf$, Chr(226)+Chr(128)+Chr(137), -1, #PB_Ascii)

For i = 0 To 3
  Debug PeekA (@Buf$ + i)
Next
or

Code: Select all

Define Buf$ {5}

*buf.Ascii = @Buf$
*buf\a = 226 : *buf + 1
*buf\a = 128 : *buf + 1
*buf\a = 137 : *buf + 1
*buf\a = 0

For i = 0 To 3
  Debug PeekA (@Buf$ + i)
Next
Windows (x64)
Raspberry Pi OS (Arm64)
RASHAD
PureBasic Expert
PureBasic Expert
Posts: 4659
Joined: Sun Apr 12, 2009 6:27 am

Re: Changing Stringformat Ascii, Unicode, Utf8

Post by RASHAD »

I faced that problem before
Next is my workaround which I used in one of my application
Adapt it for your need
Of course you can Run josh.exe from memory too

1 - :
Save the next code then Compile it as "josh.exe" in Unicode Mode

Code: Select all

Result$ = ProgramParameter()
Define Buf$ {5}

PokeS (@Buf$, Chr(Val(Result$)), -1, #PB_UTF8)

For i = 0 To 3
  MessageRequester("",Str(PeekA (@Buf$ + i)),#PB_MessageRequester_Ok)
Next
2 - :
Run the next snippet in Ascii Mode

Code: Select all

If FileSize("d:\josh.exe") > 0
  RunProgram("d:\josh.exe","8201","",#PB_Program_Hide| #PB_Program_Wait)
EndIf
Egypt my love
User avatar
Josh
Addict
Addict
Posts: 1183
Joined: Sat Feb 13, 2010 3:45 pm

Re: Changing Stringformat Ascii, Unicode, Utf8

Post by Josh »

@wilbert

Unfortunately not good :(

I only know the character code 8201 (this is just an example, of course, any other character code can be used) and not the Utf8 code.

@RASHAD

I have to take a look at it.
sorry for my bad english
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3870
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: Changing Stringformat Ascii, Unicode, Utf8

Post by wilbert »

Josh wrote:I only know the character code 8201 (this is just an example, of course, any other character code can be used) and not the Utf8 code.
Take a look at the UTF8_PokeC procedure from the thread I mentioned.
http://www.purebasic.fr/english/viewtop ... 20#p451120
I think that does what you want.
Windows (x64)
Raspberry Pi OS (Arm64)
User avatar
kenmo
Addict
Addict
Posts: 1967
Joined: Tue Dec 23, 2003 3:54 am

Re: Changing Stringformat Ascii, Unicode, Utf8

Post by kenmo »

Josh wrote:In my case I use Print (), PrintN (), ReadConsoleData (), WriteConsoleData (), all of them do not support Utf8/Ascii flags. I also write and read text files in Utf8 format. When I compile my program in Ascii mode, everything is simple and clear. However, if I compile in Unicode mode, it will format back and forth and the simplest things will get really complicated.
ReadConsoleData() and WriteConsoleData() operate on bytes, not strings. They aren't affected by Unicode vs. ASCII mode.

Reading and writing UTF-8 files works better in Unicode mode!
When you read a UTF-8 file in ASCII mode you lose most characters > 255.


Print() and PrintN()... I agree with your Feature Request for a flag.
But there are easy workarounds:
http://www.purebasic.fr/english/viewtop ... =3&t=69980
User avatar
Josh
Addict
Addict
Posts: 1183
Joined: Sat Feb 13, 2010 3:45 pm

Re: Changing Stringformat Ascii, Unicode, Utf8

Post by Josh »

@RASHAD
Thanks for your tip, but I like wilbert's solution better.

@wilbert
Thank you, this is exactly what I need. Sorry that I didn't find this at your first hint.
sorry for my bad english
Post Reply