Re: Unicode question
Posted: Sat Jan 13, 2024 12:57 pm
There is something wrong with your data. It is not normal for Unicode characters in Windows to be in high-low byte notation. Therefore your start pointer is wrong.
http://www.purebasic.com
https://www.purebasic.fr/english/
Endianness isn't OS specific. Several protocols and file formats need a specific endianness. The BOM usually decides which endian is used, but not always. Sometimes the protocol says big endian and then a BOM is unnecessary because it's always big endian.mk-soft wrote: Sat Jan 13, 2024 12:57 pm There is something wrong with your data. It is not normal for Unicode characters in Windows to be in high-low byte notation. Therefore your start pointer is wrong.
I don't quite understand why my example isn't a good one. Maybe I'm missing something...juergenkulow wrote: Sat Jan 13, 2024 12:47 pm Your procedure is fine, but unfortunately your test program is not.

Code: Select all
EnableExplicit
; Version 1
Procedure ToggleStringEndianess(*Char.Character)
Protected *a1.Ascii = *Char
Protected *a2.Ascii = *a1 + 1
While *Char\c
Swap *a1\a, *a2\a
*Char + SizeOf(Character)
*a1 = *Char
*a2 = *a1 + 1
Wend
EndProcedure
; -------------------------------------------------------------
; Version 2 with Pointer Structure
Structure pChar
c.c[0]
a.a[0]
EndStructure
Procedure ToggleStringEndianess2(*Char.pChar)
While *Char\c[0]
Swap *Char\a[0], *Char\a[1]
*Char + SizeOf(Character)
Wend
EndProcedure
; -------------------------------------------------------------
; Version 3 Assembler
Procedure ToggleStringEndianess3(*Char.Character)
CompilerIf #PB_Compiler_64Bit
While *Char\c
!MOV RAX, [p.p_Char]
!MOV DX, WORD[RAX]
!XCHG DL, DH ; for 16 Bit ByteSwap it's the Exchange command
!MOV WORD[RAX], DX
*Char + SizeOf(Character)
Wend
CompilerElse ; #PB_Compiler_32Bit
While *Char\c
!MOV EAX, [p.p_Char]
!MOV DX, WORD[EAX]
!XCHG DL, DH ; for 16 Bit ByteSwap it's the Exchange command
!MOV WORD[EAX], DX
*Char + SizeOf(Character)
Wend
CompilerEndIf
EndProcedure
; -------------------------------------------------------------
; Testcode
Define.s sTest
; Version 1
Debug "Version 1"
Debug ""
sTest = PeekS(?MotorolaString)
; sTest = PeekS(?IntelString)
Debug sTest
ToggleStringEndianess(@sTest)
Debug sTest
ToggleStringEndianess(@sTest)
Debug sTest
; Version 2
Debug ""
Debug "Version 2"
Debug ""
sTest = PeekS(?MotorolaString)
; sTest = PeekS(?IntelString)
Debug sTest
ToggleStringEndianess2(@sTest)
Debug sTest
ToggleStringEndianess2(@sTest)
Debug sTest
; Version 3
Debug ""
Debug "Version 3 Assembler"
Debug ""
sTest = PeekS(?MotorolaString)
; sTest = PeekS(?IntelString)
Debug sTest
ToggleStringEndianess3(@sTest)
Debug sTest
ToggleStringEndianess3(@sTest)
Debug sTest
DataSection
; "String" in Motorola notation Big Endian and Intel notation Little Endian
MotorolaString:
Data.a $00, $53, $00, $74, $00, $72, $00, $69, $00, $6E, $00, $67, $0, $0
IntelString:
Data.a $53, $00, $74, $00, $72, $00, $69, $00, $6E, $00, $67, $0, $0, $0
EndDataSection
Thanks for your propositions.
Why are you looking for something without a loop ?boddhi wrote: Sat Jan 13, 2024 4:18 pm My goal is to know if I can do that without a loop (For...Next, While...Wend) and read the string from memory as PeekS() can do it.
Maybe a windows API ? or else...
My answer would be that these functions were created to simplify the programmer's life. So if a function (e.g API) I don't know already exists, why reinvent the wheel?wilbert wrote: Why are you looking for something without a loop ?
A function like PeekS or WideCharToMultiByte also uses a loop internally.
The difference is that the procedure is already compiled so you don't see it.
We have Read-/WriteStringFormat, but we can only handle one format with Read-/WriteString, which seems a bit incomplete. This should be a feature request. Extend Read-/WriteString and PeekS/PokeS by endian flags.wilbert wrote: Sat Jan 13, 2024 6:38 pmWhy are you looking for something without a loop ?boddhi wrote: Sat Jan 13, 2024 4:18 pm My goal is to know if I can do that without a loop (For...Next, While...Wend) and read the string from memory as PeekS() can do it.
Maybe a windows API ? or else...
A function like PeekS or WideCharToMultiByte also uses a loop internally.
The difference is that the procedure is already compiled so you don't see it.
Note that I may have omittedDarkDragon wrote: We have Read-/WriteStringFormat
Of course, that's not what I've meant. The presence of these functions implies PureBasic can handle different endianness when reading/writing strings from/to files/memory without further additions. Unfortunately it cannot.boddhi wrote: Sat Jan 13, 2024 9:23 pmNote that I may have omittedDarkDragon wrote: We have Read-/WriteStringFormat: My file is a binary file, so it's impossible to determine the encoding with ReadStringFormat().
I understood thatDarkDragon wrote: Of course, that's not what I've meant.