Page 1 of 2
UTF-8 support for strings
Posted: Mon Aug 25, 2008 3:27 pm
by Motu
also it's working fine to place UTF-8 Strings inside the pb string lib, most of the support funktions do not work with utf-8. For example:
lcase(String) and ucase(String) will return wrong characters when giving UTF-8 String to them. For these funktions a flag would be nice:
lcase(String [,StringFormat])
also in need for:
left, right, mid etc.
under windows you can use: CharLower_() for example, but as it is part of the windows libary, it will not work with linux.
Posted: Mon Aug 25, 2008 3:29 pm
by Kaeru Gaman
first time I hear about such a problem...
did you switch BOTH compileroptions (exe and source) to UTF-8?
option
Posted: Mon Aug 25, 2008 3:36 pm
by Motu
hi Kaeru,
as far as I know there is no compiler option for utf-8 - only for unicode, what is something different.
Posted: Mon Aug 25, 2008 3:37 pm
by ts-soft
Kaeru Gaman wrote:first time I hear about such a problem...
did you switch BOTH compileroptions (exe and source) to UTF-8?
There is no compileroption for UTF-8, only for Source

solution
Posted: Mon Aug 25, 2008 3:39 pm
by Motu
so, is there a solution for this problem? If yes, please post it

Posted: Mon Aug 25, 2008 3:43 pm
by ts-soft
PB uses Unicode or ASCII for strings, not UTF-8
UTF-8 is only required for some Editor-Like Controls, so you can this
text load and write, but PB uses the string as in compileroptions enabled.
Posted: Mon Aug 25, 2008 3:57 pm
by Motu
So, the main point is - ich can save utf-8 strings under pb in String field but non of the manipulations funktions works korrekt.
Isn't there any solution for this that works under linux as well (like the windows api function - just for linux) ?
Posted: Mon Aug 25, 2008 4:23 pm
by ts-soft
I don't understand. PB uses allways Unicode or ASCII.
You can for example read a UTF-8 String from a File, so you have the
Unicode- or ASCII-String in your Variable (not UTF-8) , manipulate it and save it return as UTF-8
UTF-8 is only for import or export to file or interface and so on.
Posted: Mon Aug 25, 2008 4:26 pm
by srod
UTF 8 is not supported natively by Windows (and thus not by the api). As ts-soft said, this format is really not for native string handling, but for mediums designed for transmission etc. E.g. string storage in files is often best done using utf-8 for various reasons.
Now, using the built in memory functions (PeekS, PokeS etc.) there is nothing which you cannot do with strings held in utf-8 format. You can change case, search for substrings... there's no limit.
All you do is grab the memory buffer holding your utf-8 string, convert it to the native format (using PeekS(..., ..., #PB_UTF8)) - this format will either be Ascii or Unicode depending on your compiler settings. When done you can write the modified string back to a buffer in utf-8 format (if you require) using PokeS().
Posted: Sun Aug 31, 2008 8:21 am
by Michael Vogel
srod wrote:Now, using the built in memory functions (PeekS, PokeS etc.) there is nothing which you cannot do with strings held in utf-8 format. You can change case, search for substrings... there's no limit.
What do you think, is the best way to check the needed amount of memory for the string beeing created when using PokeS(text.s,-1,#PB_UTF8)?
I could allocate twice the length of the original string, but it would be nice to find a way to take only the memory what is really needed.
Michael
Posted: Sun Aug 31, 2008 10:33 am
by srod
Code: Select all
StringByteLength(string$, #PB_UTF8) + 1
Twice as many bytes as the number of characters wouldn't necessarily be enough because utf-8 is a variable length encoding with some characters requiring 4 bytes etc.
Posted: Sun Aug 31, 2008 10:52 am
by blueznl
http://www.xs4all.nl/~bluez/datatalk/pu ... bytelength
http://www.xs4all.nl/~bluez/datatalk/pu ... bytelength
Edit: that's what I get for walking away from the keyboard, I'm a half hour behind Srod...
Well, I'm always a half hour behind anything, pretty much, come to think of it

Posted: Sun Aug 31, 2008 3:02 pm
by Michael Vogel
srod & blueznl, you're both fast enough
I just did a short run in the late summer sun and just back you've be done (once again) the right answers for me.
Thanks to you (and all others) in this forum, I love you

Posted: Sat Sep 06, 2008 12:14 pm
by Michael Vogel
I need some string conversion functions for UTF and ASCII strings, so I wrote the following procedures:!:
The positive point is, that they work and are fast enough for normal things.
There are still some points to be careful: so the given string for the StringToAscii procedure have to be in UTF8 format - if it is already an ASCII string it may be cutted
Code: Select all
Procedure.s StringToUTF(s.s)
#AutoLength=-1
Protected buffer.s
buffer=Space(StringByteLength(s,#PB_UTF8))
PokeS(@buffer,s,#AutoLength,#PB_UTF8)
ProcedureReturn buffer
EndProcedure
Procedure.s StringToASCII(s.s)
; in der aktuellen Version MUSS der String im UTF8-Format ('Weinstraßenlauf' >> 'Weinstraßenlauf')
; vorliegen, sonst wird der String abgeschnitten ('Weinstraßenlauf' >> 'Weinstra') !
Protected buffer.s
#AutoLength=-1
s=PeekS(@s,#AutoLength,#PB_UTF8)
buffer=Space(StringByteLength(s,#PB_Ascii))
PokeS(@buffer,s,#AutoLength,#PB_Ascii)
ProcedureReturn buffer
EndProcedure
Procedure.s StringToFilename(s.s)
Protected z=Len(s)
While z
If FindString("\:/<*|?>"+#DQUOTE$,Mid(s,z,1),1)
PokeB(@s+z-1,32)
EndIf
z-1
Wend
ProcedureReturn s
EndProcedure
Posted: Sat Sep 06, 2008 8:39 pm
by Demivec
Michael Vogel wrote:I need some string conversion functions for UTF and ASCII strings, so I wrote the following procedures:!:
The positive point is, that they work and are fast enough for normal things.
There are still some points to be careful: so the given string for the StringToAscii procedure have to be in UTF8 format - if it is already an ASCII string it may be cutted
You have to add one byte for the Null when you reserve buffer space. Try your code with this slight modification:
Code: Select all
Procedure.s StringToUTF(S.s)
#AutoLength = -1
Protected Buffer.s
Buffer = Space(StringByteLength(S,#PB_UTF8) + 1) ;<== add a byte for the Null
PokeS(@Buffer,S,#AutoLength,#PB_UTF8)
ProcedureReturn Buffer
EndProcedure
Procedure.s StringToASCII(S.s)
; in der aktuellen Version MUSS der String im UTF8-Format ('Weinstraßenlauf' >> 'Weinstraßenlauf')
; vorliegen, sonst wird der String abgeschnitten ('Weinstraßenlauf' >> 'Weinstra') !
#AutoLength = -1
Protected Buffer.s
S = PeekS(@S,#AutoLength,#PB_UTF8)
Buffer = Space(StringByteLength(S,#PB_Ascii) + 1) ;<== add a byte for the Null
PokeS(@Buffer,S,#AutoLength,#PB_Ascii)
ProcedureReturn Buffer
EndProcedure
Procedure.s StringToFilename(S.s)
Protected Z = Len(S)
While Z
If FindString("\:/<*|?>"+#DQUOTE$,Mid(S,Z,1),1)
PokeB(@S + Z - 1,32)
EndIf
Z - 1
Wend
ProcedureReturn S
EndProcedure