ASCIItoUnicode & UnicodeToASCII conversions

Got an idea for enhancing PureBasic? New command(s) you'd like to see?
User avatar
Tenaja
Addict
Addict
Posts: 1959
Joined: Tue Nov 09, 2010 10:15 pm

ASCIItoUnicode & UnicodeToASCII conversions

Post by Tenaja »

Given the current announcement, can we get these commands expedited, so we can begin our transitions?

Thanks.
IdeasVacuum
Always Here
Always Here
Posts: 6426
Joined: Fri Oct 23, 2009 2:33 am
Location: Wales, UK
Contact:

Re: ASCIItoUnicode & UnicodeToASCII conversions

Post by IdeasVacuum »

Both of those conversions are potentially unreliable, especially if your app is distributed in other countries using a different Locale and Code Page. If your app has to use ASCII because of what it interfaces with, maintain it with the current PB version and save on headaches. Write your new apps as Unicode unless the interface absolutely dictates otherwise. Let's not forget, your app can consist of more than one executable, so you can for example have a tiny exe in the background processing ASCII specific requirements.
IdeasVacuum
If it sounds simple, you have not grasped the complexity.
User avatar
Tenaja
Addict
Addict
Posts: 1959
Joined: Tue Nov 09, 2010 10:15 pm

Re: ASCIItoUnicode & UnicodeToASCII conversions

Post by Tenaja »

My concern is for converting existing code and files, not actually converting unicode russian characters to ascii.
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3942
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: ASCIItoUnicode & UnicodeToASCII conversions

Post by wilbert »

I understand you want a native procedure but in itself it is very simple if you only use characters 32-127.
You simply convert 1 byte to 2 or 2 bytes to 1.
Here's an example that does a fast conversion of one type of input buffer into the other type of output buffer.

Code: Select all

Procedure.i UnicodeToAscii(*UnicodeIn, *AsciiOut)
  ; *UnicodeIn : zero terminated unicode input buffer
  ; *AsciiOut  : zero terminated ascii output buffer
  ; Result     : number of converted characters (zero character not included)
  !mov eax, -1
  CompilerIf #PB_Compiler_Processor = #PB_Processor_x86
    !mov [esp - 4], ebx
    !mov ebx, [p.p_UnicodeIn]
    !mov edx, [p.p_AsciiOut]
    !ua_loop:
    !inc eax
    !mov cx, [ebx + eax * 2]
    !mov [edx + eax], cl
  CompilerElse
    !mov r8, [p.p_UnicodeIn]
    !mov r9, [p.p_AsciiOut]
    !ua_loop:
    !inc eax
    !mov cx, [r8 + rax * 2]
    !mov [r9 + rax], cl
  CompilerEndIf
  !and cx, cx
  !jnz ua_loop
  CompilerIf #PB_Compiler_Processor = #PB_Processor_x86
    !mov ebx, [esp - 4]
  CompilerEndIf  
  ProcedureReturn  
EndProcedure

Procedure.i AsciiToUnicode(*AsciiIn, *UnicodeOut)
  ; *AsciiIn    : zero terminated ascii input buffer
  ; *UnicodeOut : zero terminated unicode output buffer
  ; Result      : number of converted characters (zero character not included)  
  !mov eax, -1
  CompilerIf #PB_Compiler_Processor = #PB_Processor_x86
    !mov [esp - 4], ebx
    !mov ebx, [p.p_AsciiIn]
    !mov edx, [p.p_UnicodeOut]
    !au_loop:
    !inc eax
    !movzx cx, byte [ebx + eax]
    !mov [edx + eax * 2], cx
  CompilerElse
    !mov r8, [p.p_AsciiIn]
    !mov r9, [p.p_UnicodeOut]
    !au_loop:
    !inc eax
    !movzx cx, byte [r8 + rax]
    !mov [r9 + rax * 2], cx
  CompilerEndIf
  !and cx, cx
  !jnz au_loop
  CompilerIf #PB_Compiler_Processor = #PB_Processor_x86
    !mov ebx, [esp - 4]
  CompilerEndIf  
  ProcedureReturn    
EndProcedure



MemSize = 1024 * 1024; Reserve 1 MB

*In = AllocateMemory(MemSize)
*Out = AllocateMemory(MemSize)

PokeS(*In, "Test", -1, #PB_Ascii)
Debug AsciiToUnicode(*In, *Out)
Debug PeekS(*Out, -1, #PB_Unicode)
Things get complicated and would require more conversion time if you take into account values 128-255 from different code pages.
And this code page mess is exactly why unicode in most cases is a better solution compared to ascii.
Windows (x64)
Raspberry Pi OS (Arm64)
Post Reply