Page 1 of 1

ASCIItoUnicode & UnicodeToASCII conversions

Posted: Sat Aug 09, 2014 3:53 pm
by Tenaja
Given the current announcement, can we get these commands expedited, so we can begin our transitions?

Thanks.

Re: ASCIItoUnicode & UnicodeToASCII conversions

Posted: Sat Aug 09, 2014 4:12 pm
by IdeasVacuum
Both of those conversions are potentially unreliable, especially if your app is distributed in other countries using a different Locale and Code Page. If your app has to use ASCII because of what it interfaces with, maintain it with the current PB version and save on headaches. Write your new apps as Unicode unless the interface absolutely dictates otherwise. Let's not forget, your app can consist of more than one executable, so you can for example have a tiny exe in the background processing ASCII specific requirements.

Re: ASCIItoUnicode & UnicodeToASCII conversions

Posted: Sat Aug 09, 2014 4:55 pm
by Tenaja
My concern is for converting existing code and files, not actually converting unicode russian characters to ascii.

Re: ASCIItoUnicode & UnicodeToASCII conversions

Posted: Sat Aug 09, 2014 7:14 pm
by wilbert
I understand you want a native procedure but in itself it is very simple if you only use characters 32-127.
You simply convert 1 byte to 2 or 2 bytes to 1.
Here's an example that does a fast conversion of one type of input buffer into the other type of output buffer.

Code: Select all

Procedure.i UnicodeToAscii(*UnicodeIn, *AsciiOut)
  ; *UnicodeIn : zero terminated unicode input buffer
  ; *AsciiOut  : zero terminated ascii output buffer
  ; Result     : number of converted characters (zero character not included)
  !mov eax, -1
  CompilerIf #PB_Compiler_Processor = #PB_Processor_x86
    !mov [esp - 4], ebx
    !mov ebx, [p.p_UnicodeIn]
    !mov edx, [p.p_AsciiOut]
    !ua_loop:
    !inc eax
    !mov cx, [ebx + eax * 2]
    !mov [edx + eax], cl
  CompilerElse
    !mov r8, [p.p_UnicodeIn]
    !mov r9, [p.p_AsciiOut]
    !ua_loop:
    !inc eax
    !mov cx, [r8 + rax * 2]
    !mov [r9 + rax], cl
  CompilerEndIf
  !and cx, cx
  !jnz ua_loop
  CompilerIf #PB_Compiler_Processor = #PB_Processor_x86
    !mov ebx, [esp - 4]
  CompilerEndIf  
  ProcedureReturn  
EndProcedure

Procedure.i AsciiToUnicode(*AsciiIn, *UnicodeOut)
  ; *AsciiIn    : zero terminated ascii input buffer
  ; *UnicodeOut : zero terminated unicode output buffer
  ; Result      : number of converted characters (zero character not included)  
  !mov eax, -1
  CompilerIf #PB_Compiler_Processor = #PB_Processor_x86
    !mov [esp - 4], ebx
    !mov ebx, [p.p_AsciiIn]
    !mov edx, [p.p_UnicodeOut]
    !au_loop:
    !inc eax
    !movzx cx, byte [ebx + eax]
    !mov [edx + eax * 2], cx
  CompilerElse
    !mov r8, [p.p_AsciiIn]
    !mov r9, [p.p_UnicodeOut]
    !au_loop:
    !inc eax
    !movzx cx, byte [r8 + rax]
    !mov [r9 + rax * 2], cx
  CompilerEndIf
  !and cx, cx
  !jnz au_loop
  CompilerIf #PB_Compiler_Processor = #PB_Processor_x86
    !mov ebx, [esp - 4]
  CompilerEndIf  
  ProcedureReturn    
EndProcedure



MemSize = 1024 * 1024; Reserve 1 MB

*In = AllocateMemory(MemSize)
*Out = AllocateMemory(MemSize)

PokeS(*In, "Test", -1, #PB_Ascii)
Debug AsciiToUnicode(*In, *Out)
Debug PeekS(*Out, -1, #PB_Unicode)
Things get complicated and would require more conversion time if you take into account values 128-255 from different code pages.
And this code page mess is exactly why unicode in most cases is a better solution compared to ascii.