Page 1 of 1

Save text file bug when source code is UTF-8 format

Posted: Fri Nov 18, 2022 5:39 am
by StarBootics
Hello everyone,

A serious problem when the source code is being saved in UTF-8 file format. See the bug description in the following code for more details.

Best regards
StarBootics

Code: Select all

; <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
; File Name : Save text file bug.pb
; File version: 1.0.0
; Programming : Bug Demonstrator
; Programmed by : StarBootics
; Date : November 17th, 2022
; Last Update : November 17th, 2022
; PureBasic code : V6.00 LTS
; Platform : Debian GNU/Linux 11 (bullseye) x86-64
; <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

; <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
; Bug description
;
; When you run this code when it's being saved in Plain text
; file format, the created file size are :
;
; Text ascii file size : 32
; Text UTF-8 file size : 36
; Text UTF-16 file size : 68
;
; When we open the files with GEdit the file content is
; what we expect to have.
;
; Ce texte est en théorie en Ascii
; Ce texte est en théorie en UTF-8
; Ce texte est en théorie en UTF-16
;
; <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
;
; When you run this code when it's being saved in UTF-8
; file format, the created file size are :
;
; Text ascii file size : 33
; Text UTF-8 file size : 38
; Text UTF-16 file size : 70
;
; Furthermore when we open the text files with GEdit we have 
; an issue with the file content, it look like that : 
;
; Ce texte est en théorie en Ascii
; Ce texte est en théorie en UTF-8
; Ce texte est en théorie en UTF-16
;
; <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

If CreateFile(0, "Text ascii")
  
  WriteStringFormat(0, #PB_Ascii)
  
  WriteString(0, "Ce texte est en théorie en Ascii", #PB_Ascii)

  CloseFile(0)
  
EndIf


If CreateFile(0, "Text UTF-8")
  
  WriteStringFormat(0, #PB_UTF8)
  
  WriteString(0, "Ce texte est en théorie en UTF-8", #PB_UTF8)

  CloseFile(0)
  
EndIf


If CreateFile(0, "Text UTF-16")
  
  WriteStringFormat(0, #PB_Unicode)
  
  WriteString(0, "Ce texte est en théorie en UTF-16", #PB_Unicode)

  CloseFile(0)
  
EndIf

Debug "Text ascii file size : " + Str(FileSize("Text ascii"))
Debug "Text UTF-8 file size : " + Str(FileSize("Text UTF-8"))
Debug "Text UTF-16 file size : " + Str(FileSize("Text UTF-16"))

; <<<<<<<<<<<<<<<<<<<<<<<
; <<<<< END OF FILE <<<<<
; <<<<<<<<<<<<<<<<<<<<<<<

Re: Save text file bug when source code is UTF-8 format

Posted: Fri Nov 18, 2022 9:39 am
by mk-soft
I cannot reproduce this under Ubuntu.
I'll have to try it out with Raspberry.

P.S.
Check your Preference -> Compiler -> Defaults -> Text encoding

Re: Save text file bug when source code is UTF-8 format

Posted: Fri Nov 18, 2022 10:31 am
by juergenkulow
é is UTF-8 $c3 $a9.
But GEdit saves $C3 $83 $C2 $A9.

Code: Select all

; é saved with GEdit and PB-IDE
ShowMemoryViewer(?Label,?LabelEnd-?Label)
CallDebugger
DataSection
  Label:
  IncludeBinary "/tmp/savetext.pb"
  LabelEnd:
EndDataSection

; GEdit save 0000000000448DC0  74 68 C3 83 C2 A9 6F 72 69 65 20 65 6E 20 41 73  théorie
; PBIDE save 0000000000448ED0  74 68 C3 A9 6F 72 69 65 20 65 6E 20 41 73 63 69  théorie

Re: Save text file bug when source code is UTF-8 format

Posted: Fri Nov 18, 2022 12:04 pm
by StarBootics
mk-soft wrote: Fri Nov 18, 2022 9:39 am Check your Preference -> Compiler -> Defaults -> Text encoding
With that setting set to UTF-8 encoding and the source file is also saved with UTF-8 encoding only the ASCII work properly both UTF-8 and UTF-16 as the "é" character wrongly encoded. The only way to have the expected result is when the source code is save as plain text : File menu -> File format -> Encoding Plain text is checked.

I didn't test with other accented characters but the problem will probably be the same.

This problem seems to be present for a very long time because I didn't remember that I have posted a similar bug here : viewtopic.php?t=77347

Best regards
StarBootics