ASCIItoUnicode & UnicodeToASCII conversions

Tenaja · Post by **Tenaja** » Sat Aug 09, 2014 3:53 pm

Given the current announcement, can we get these commands expedited, so we can begin our transitions?

Thanks.

IdeasVacuum · Post by **IdeasVacuum** » Sat Aug 09, 2014 4:12 pm

Both of those conversions are potentially unreliable, especially if your app is distributed in other countries using a different Locale and Code Page. If your app has to use ASCII because of what it interfaces with, maintain it with the current PB version and save on headaches. Write your new apps as Unicode unless the interface absolutely dictates otherwise. Let's not forget, your app can consist of more than one executable, so you can for example have a tiny exe in the background processing ASCII specific requirements.

Tenaja · Post by **Tenaja** » Sat Aug 09, 2014 4:55 pm

My concern is for converting existing code and files, not actually converting unicode russian characters to ascii.

wilbert · Post by **wilbert** » Sat Aug 09, 2014 7:14 pm

I understand you want a native procedure but in itself it is very simple if you only use characters 32-127.
You simply convert 1 byte to 2 or 2 bytes to 1.
Here's an example that does a fast conversion of one type of input buffer into the other type of output buffer.

Code: Select all

Procedure.i UnicodeToAscii(*UnicodeIn, *AsciiOut)
  ; *UnicodeIn : zero terminated unicode input buffer
  ; *AsciiOut  : zero terminated ascii output buffer
  ; Result     : number of converted characters (zero character not included)
  !mov eax, -1
  CompilerIf #PB_Compiler_Processor = #PB_Processor_x86
    !mov [esp - 4], ebx
    !mov ebx, [p.p_UnicodeIn]
    !mov edx, [p.p_AsciiOut]
    !ua_loop:
    !inc eax
    !mov cx, [ebx + eax * 2]
    !mov [edx + eax], cl
  CompilerElse
    !mov r8, [p.p_UnicodeIn]
    !mov r9, [p.p_AsciiOut]
    !ua_loop:
    !inc eax
    !mov cx, [r8 + rax * 2]
    !mov [r9 + rax], cl
  CompilerEndIf
  !and cx, cx
  !jnz ua_loop
  CompilerIf #PB_Compiler_Processor = #PB_Processor_x86
    !mov ebx, [esp - 4]
  CompilerEndIf  
  ProcedureReturn  
EndProcedure

Procedure.i AsciiToUnicode(*AsciiIn, *UnicodeOut)
  ; *AsciiIn    : zero terminated ascii input buffer
  ; *UnicodeOut : zero terminated unicode output buffer
  ; Result      : number of converted characters (zero character not included)  
  !mov eax, -1
  CompilerIf #PB_Compiler_Processor = #PB_Processor_x86
    !mov [esp - 4], ebx
    !mov ebx, [p.p_AsciiIn]
    !mov edx, [p.p_UnicodeOut]
    !au_loop:
    !inc eax
    !movzx cx, byte [ebx + eax]
    !mov [edx + eax * 2], cx
  CompilerElse
    !mov r8, [p.p_AsciiIn]
    !mov r9, [p.p_UnicodeOut]
    !au_loop:
    !inc eax
    !movzx cx, byte [r8 + rax]
    !mov [r9 + rax * 2], cx
  CompilerEndIf
  !and cx, cx
  !jnz au_loop
  CompilerIf #PB_Compiler_Processor = #PB_Processor_x86
    !mov ebx, [esp - 4]
  CompilerEndIf  
  ProcedureReturn    
EndProcedure



MemSize = 1024 * 1024; Reserve 1 MB

*In = AllocateMemory(MemSize)
*Out = AllocateMemory(MemSize)

PokeS(*In, "Test", -1, #PB_Ascii)
Debug AsciiToUnicode(*In, *Out)
Debug PeekS(*Out, -1, #PB_Unicode)

Things get complicated and would require more conversion time if you take into account values 128-255 from different code pages.
And this code page mess is exactly why unicode in most cases is a better solution compared to ascii.

PureBasic Forums - English

ASCIItoUnicode & UnicodeToASCII conversions

ASCIItoUnicode & UnicodeToASCII conversions

Re: ASCIItoUnicode & UnicodeToASCII conversions

Re: ASCIItoUnicode & UnicodeToASCII conversions

Re: ASCIItoUnicode & UnicodeToASCII conversions