Unicode question
Re: Unicode question
There is something wrong with your data. It is not normal for Unicode characters in Windows to be in high-low byte notation. Therefore your start pointer is wrong.
My Projects ThreadToGUI / OOP-BaseClass / EventDesigner V3
PB v3.30 / v5.75 - OS Mac Mini OSX 10.xx - VM Window Pro / Linux Ubuntu
Downloads on my Webspace / OneDrive
PB v3.30 / v5.75 - OS Mac Mini OSX 10.xx - VM Window Pro / Linux Ubuntu
Downloads on my Webspace / OneDrive
-
DarkDragon
- Addict

- Posts: 2347
- Joined: Mon Jun 02, 2003 9:16 am
- Location: Germany
- Contact:
Re: Unicode question
Endianness isn't OS specific. Several protocols and file formats need a specific endianness. The BOM usually decides which endian is used, but not always. Sometimes the protocol says big endian and then a BOM is unnecessary because it's always big endian.mk-soft wrote: Sat Jan 13, 2024 12:57 pm There is something wrong with your data. It is not normal for Unicode characters in Windows to be in high-low byte notation. Therefore your start pointer is wrong.
bye,
Daniel
Daniel
Re: Unicode question
@juergenkulow
@mk-soft
Below, a hexadecimal view of my file.
I retrieve a bigger block of data with ReadData(), so it retains the same structure once in memory...

I don't quite understand why my example isn't a good one. Maybe I'm missing something...juergenkulow wrote: Sat Jan 13, 2024 12:47 pm Your procedure is fine, but unfortunately your test program is not.
@mk-soft
Below, a hexadecimal view of my file.
I retrieve a bigger block of data with ReadData(), so it retains the same structure once in memory...

If my English syntax and lexicon are incorrect, please bear with Google translate and DeepL. They rarely agree with each other!
Except on this sentence...
Except on this sentence...
Re: Unicode question
some more Versions how to change String Endianess
Code: Select all
EnableExplicit
; Version 1
Procedure ToggleStringEndianess(*Char.Character)
Protected *a1.Ascii = *Char
Protected *a2.Ascii = *a1 + 1
While *Char\c
Swap *a1\a, *a2\a
*Char + SizeOf(Character)
*a1 = *Char
*a2 = *a1 + 1
Wend
EndProcedure
; -------------------------------------------------------------
; Version 2 with Pointer Structure
Structure pChar
c.c[0]
a.a[0]
EndStructure
Procedure ToggleStringEndianess2(*Char.pChar)
While *Char\c[0]
Swap *Char\a[0], *Char\a[1]
*Char + SizeOf(Character)
Wend
EndProcedure
; -------------------------------------------------------------
; Version 3 Assembler
Procedure ToggleStringEndianess3(*Char.Character)
CompilerIf #PB_Compiler_64Bit
While *Char\c
!MOV RAX, [p.p_Char]
!MOV DX, WORD[RAX]
!XCHG DL, DH ; for 16 Bit ByteSwap it's the Exchange command
!MOV WORD[RAX], DX
*Char + SizeOf(Character)
Wend
CompilerElse ; #PB_Compiler_32Bit
While *Char\c
!MOV EAX, [p.p_Char]
!MOV DX, WORD[EAX]
!XCHG DL, DH ; for 16 Bit ByteSwap it's the Exchange command
!MOV WORD[EAX], DX
*Char + SizeOf(Character)
Wend
CompilerEndIf
EndProcedure
; -------------------------------------------------------------
; Testcode
Define.s sTest
; Version 1
Debug "Version 1"
Debug ""
sTest = PeekS(?MotorolaString)
; sTest = PeekS(?IntelString)
Debug sTest
ToggleStringEndianess(@sTest)
Debug sTest
ToggleStringEndianess(@sTest)
Debug sTest
; Version 2
Debug ""
Debug "Version 2"
Debug ""
sTest = PeekS(?MotorolaString)
; sTest = PeekS(?IntelString)
Debug sTest
ToggleStringEndianess2(@sTest)
Debug sTest
ToggleStringEndianess2(@sTest)
Debug sTest
; Version 3
Debug ""
Debug "Version 3 Assembler"
Debug ""
sTest = PeekS(?MotorolaString)
; sTest = PeekS(?IntelString)
Debug sTest
ToggleStringEndianess3(@sTest)
Debug sTest
ToggleStringEndianess3(@sTest)
Debug sTest
DataSection
; "String" in Motorola notation Big Endian and Intel notation Little Endian
MotorolaString:
Data.a $00, $53, $00, $74, $00, $72, $00, $69, $00, $6E, $00, $67, $0, $0
IntelString:
Data.a $53, $00, $74, $00, $72, $00, $69, $00, $6E, $00, $67, $0, $0, $0
EndDataSection
Re: Unicode question
Thanks for your propositions.
I have already my own. See here
My goal is to know if I can do that without a loop (For...Next, While...Wend) and read the string from memory as PeekS() can do it.
Maybe a windows API ? or else...
If my English syntax and lexicon are incorrect, please bear with Google translate and DeepL. They rarely agree with each other!
Except on this sentence...
Except on this sentence...
Re: Unicode question
Why are you looking for something without a loop ?boddhi wrote: Sat Jan 13, 2024 4:18 pm My goal is to know if I can do that without a loop (For...Next, While...Wend) and read the string from memory as PeekS() can do it.
Maybe a windows API ? or else...
A function like PeekS or WideCharToMultiByte also uses a loop internally.
The difference is that the procedure is already compiled so you don't see it.
Windows (x64)
Raspberry Pi OS (Arm64)
Raspberry Pi OS (Arm64)
Re: Unicode question
My answer would be that these functions were created to simplify the programmer's life. So if a function (e.g API) I don't know already exists, why reinvent the wheel?wilbert wrote: Why are you looking for something without a loop ?
A function like PeekS or WideCharToMultiByte also uses a loop internally.
The difference is that the procedure is already compiled so you don't see it.
Using Unicode with Windows isn't as simple as that and there may be techniques or informations I don't know about despite my research.
I'm not a programming pro, I'm making a request that may not have a positive response, in which case (which it seems to be) I'll bypass the problem.
If my English syntax and lexicon are incorrect, please bear with Google translate and DeepL. They rarely agree with each other!
Except on this sentence...
Except on this sentence...
-
DarkDragon
- Addict

- Posts: 2347
- Joined: Mon Jun 02, 2003 9:16 am
- Location: Germany
- Contact:
Re: Unicode question
We have Read-/WriteStringFormat, but we can only handle one format with Read-/WriteString, which seems a bit incomplete. This should be a feature request. Extend Read-/WriteString and PeekS/PokeS by endian flags.wilbert wrote: Sat Jan 13, 2024 6:38 pmWhy are you looking for something without a loop ?boddhi wrote: Sat Jan 13, 2024 4:18 pm My goal is to know if I can do that without a loop (For...Next, While...Wend) and read the string from memory as PeekS() can do it.
Maybe a windows API ? or else...
A function like PeekS or WideCharToMultiByte also uses a loop internally.
The difference is that the procedure is already compiled so you don't see it.
bye,
Daniel
Daniel
Re: Unicode question
Note that I may have omittedDarkDragon wrote: We have Read-/WriteStringFormat
Last edited by boddhi on Sat Jan 13, 2024 9:32 pm, edited 1 time in total.
If my English syntax and lexicon are incorrect, please bear with Google translate and DeepL. They rarely agree with each other!
Except on this sentence...
Except on this sentence...
-
DarkDragon
- Addict

- Posts: 2347
- Joined: Mon Jun 02, 2003 9:16 am
- Location: Germany
- Contact:
Re: Unicode question
Of course, that's not what I've meant. The presence of these functions implies PureBasic can handle different endianness when reading/writing strings from/to files/memory without further additions. Unfortunately it cannot.boddhi wrote: Sat Jan 13, 2024 9:23 pmNote that I may have omittedDarkDragon wrote: We have Read-/WriteStringFormat: My file is a binary file, so it's impossible to determine the encoding with ReadStringFormat().
That may happen and is totally ok, the best you can do is create a feature request.
bye,
Daniel
Daniel
Re: Unicode question
I understood thatDarkDragon wrote: Of course, that's not what I've meant.
If my English syntax and lexicon are incorrect, please bear with Google translate and DeepL. They rarely agree with each other!
Except on this sentence...
Except on this sentence...


