Proper way to write Unicode to Windows console

Windows specific forum
User avatar
kenmo
Addict
Addict
Posts: 1967
Joined: Tue Dec 23, 2003 3:54 am

Proper way to write Unicode to Windows console

Post by kenmo »

[PB 5.62 x86, Windows 7]

I am struggling to find the "universal" way to print Unicode output to a Windows cmd console.

I know the Windows cmd.exe uses codepages, but I don't want my programs to rely on correct codepage, or use "chcp" to change it... I read if you use WriteConsoleW_() you shouldn't need chcp (I think PB's Print() internally uses this?)


Compile this simple program as "print.exe" and it prints the expected output.

Code: Select all

If OpenConsole()
  Print("Héllo World!")
  CloseConsole()
EndIf
But pipe it through the common command "more", as "print | more" and I have two problems:
1. it prints as a 1-char wide column
2. the é character is displayed incorrectly


So I read that "more" expects a BOM. So this version will work with "print | more", but now the normal "print" shows a blank box character where the BOM is printed.

Code: Select all

#BOM$ = Chr($FeFF)
If OpenConsole()
  Print(#BOM$)
  Print("Héllo World!")
  CloseConsole()
EndIf

If I use a #BS$ or #CR$ to overwrite the #BOM$, then the console output of "print" and "print | more" both *look* correct.
But now, if you pipe the output to a file like "print > log.txt", the text file contains an unwanted BOM and BS or CR at the very beginning.

Code: Select all

#BOM$ = Chr($FeFF)
If OpenConsole()
  Print(#BOM$)
  Print(#BS$)
  Print("Héllo World!")
  CloseConsole()
EndIf

Is there a "proper" way to handle all this???


(PS. I also tried converting text to a UTF8() byte buffer, then using WriteConsoleData()... output looks correct in cmd console, but piping it through "more" fails to display at all!)
Marc56us
Addict
Addict
Posts: 1477
Joined: Sat Feb 08, 2014 3:26 pm

Re: Proper way to write Unicode to Windows console

Post by Marc56us »

Have you tried with the new parameter of version 5.70b1 ?

- Added: an optional 'Mode' parameter for OpenConsole() to specify the string format to use

(But I don't know what to put, because this new option is not yet in the help)
:idea: :?: Try

Code: Select all

OpenConsole("Héllo", #PB_UTF8) 
:wink:
User avatar
skywalk
Addict
Addict
Posts: 3972
Joined: Wed Dec 23, 2009 10:14 pm
Location: Boston, MA

Re: Proper way to write Unicode to Windows console

Post by skywalk »

Side question?
What are the reasons for using the console?
Is this to avoid an event loop?
The nice thing about standards is there are so many to choose from. ~ Andrew Tanenbaum
User avatar
kenmo
Addict
Addict
Posts: 1967
Joined: Tue Dec 23, 2003 3:54 am

Re: Proper way to write Unicode to Windows console

Post by kenmo »

@marc Good idea! I did not see that beta feature. Wouldn't that be convenient.
But #PB_Unicode acts the same as 5.62.
(#PB_Ascii of course doesn't print Unicode chars)
#PB_UTF8 via "print.exe | more" gives me 1 line of text now (good!) but the é char shows up as 2 wrong chars (é is 2 bytes in UTF-8).

Code: Select all

If OpenConsole("", #PB_UTF8)
  Print("Héllo World!")
  CloseConsole()
EndIf
@skywalk Working on a commandline-only tool that's intended to be used in batch script files, so the I/O can be piped to/from other Windows programs.
Marc56us
Addict
Addict
Posts: 1477
Joined: Sat Feb 08, 2014 3:26 pm

Re: Proper way to write Unicode to Windows console

Post by Marc56us »

Convert log file from utf-8 to ascii after batch processing

Code: Select all

ReadFile(0, "Log_Utf-8.txt")
While Not Eof(0)
    Txt$ = ReadString(0, #PB_File_IgnoreEOL)
Wend    
CloseFile(0)

OpenFile(1, "Log_Ascii.txt")
WriteString(1, Txt$, #PB_Ascii)
CloseFile(1)
(Tested with Notepad++, accents stay ok)
User avatar
kenmo
Addict
Addict
Posts: 1967
Joined: Tue Dec 23, 2003 3:54 am

Re: Proper way to write Unicode to Windows console

Post by kenmo »

Converting text files is no problem. I just want to Print text that can be displayed in the console, piped to a file, or piped to other programs.
(How do standard tools like "dir" handle this? ***)

Reading the MSDN for WriteConsole...
https://docs.microsoft.com/en-us/window ... iteconsole
If an application processes multilingual output that can be redirected, determine whether the output handle is a console handle (one method is to call the GetConsoleMode function and check whether it succeeds). If the handle is a console handle, call WriteConsole. If the handle is not a console handle, the output is redirected and you should call WriteFile to perform the I/O. Be sure to prefix a Unicode plain text file with a byte order mark.
This is interesting. It's led me to this, which almost does everything I want:

Code: Select all

Procedure.i IsRedirecting() ; to file, or another program
  Protected hConsoleHandle.i = GetStdHandle_(#STD_OUTPUT_HANDLE)
  If (hConsoleHandle)
    Protected CMode.i
    GetConsoleMode_(hConsoleHandle, @CMode)
    If (Not CMode)
      ProcedureReturn (#True)
    EndIf
  EndIf
  ProcedureReturn (#False)
EndProcedure

#BOM$ = Chr($FeFF)

If OpenConsole()
  If (IsRedirecting())
    Print(#BOM$)
    Print("*** Redirected output! ***" + #CRLF$)
  Else
    Print("Console output." + #CRLF$)
  EndIf
  Print("Héllo World!" + #CRLF$)
  CloseConsole()
EndIf


*** Even Windows programs like "dir" are not perfect. For example, name a file with a Unicode character > $FFFF (this is valid!).
Now call "dir /b" to list files. The character shows as 2 blank boxes.
Now call "dir /b | more". The character shows as 2 question marks.
(At least, this is what I see on Windows 7.)
Post Reply