Save text file bug when source code is UTF-8 format

Linux specific forum
User avatar
StarBootics
Addict
Addict
Posts: 984
Joined: Sun Jul 07, 2013 11:35 am
Location: Canada

Save text file bug when source code is UTF-8 format

Post by StarBootics »

Hello everyone,

A serious problem when the source code is being saved in UTF-8 file format. See the bug description in the following code for more details.

Best regards
StarBootics

Code: Select all

; <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
; File Name : Save text file bug.pb
; File version: 1.0.0
; Programming : Bug Demonstrator
; Programmed by : StarBootics
; Date : November 17th, 2022
; Last Update : November 17th, 2022
; PureBasic code : V6.00 LTS
; Platform : Debian GNU/Linux 11 (bullseye) x86-64
; <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

; <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
; Bug description
;
; When you run this code when it's being saved in Plain text
; file format, the created file size are :
;
; Text ascii file size : 32
; Text UTF-8 file size : 36
; Text UTF-16 file size : 68
;
; When we open the files with GEdit the file content is
; what we expect to have.
;
; Ce texte est en théorie en Ascii
; Ce texte est en théorie en UTF-8
; Ce texte est en théorie en UTF-16
;
; <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
;
; When you run this code when it's being saved in UTF-8
; file format, the created file size are :
;
; Text ascii file size : 33
; Text UTF-8 file size : 38
; Text UTF-16 file size : 70
;
; Furthermore when we open the text files with GEdit we have 
; an issue with the file content, it look like that : 
;
; Ce texte est en théorie en Ascii
; Ce texte est en théorie en UTF-8
; Ce texte est en théorie en UTF-16
;
; <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

If CreateFile(0, "Text ascii")
  
  WriteStringFormat(0, #PB_Ascii)
  
  WriteString(0, "Ce texte est en théorie en Ascii", #PB_Ascii)

  CloseFile(0)
  
EndIf


If CreateFile(0, "Text UTF-8")
  
  WriteStringFormat(0, #PB_UTF8)
  
  WriteString(0, "Ce texte est en théorie en UTF-8", #PB_UTF8)

  CloseFile(0)
  
EndIf


If CreateFile(0, "Text UTF-16")
  
  WriteStringFormat(0, #PB_Unicode)
  
  WriteString(0, "Ce texte est en théorie en UTF-16", #PB_Unicode)

  CloseFile(0)
  
EndIf

Debug "Text ascii file size : " + Str(FileSize("Text ascii"))
Debug "Text UTF-8 file size : " + Str(FileSize("Text UTF-8"))
Debug "Text UTF-16 file size : " + Str(FileSize("Text UTF-16"))

; <<<<<<<<<<<<<<<<<<<<<<<
; <<<<< END OF FILE <<<<<
; <<<<<<<<<<<<<<<<<<<<<<<
The Stone Age did not end due to a shortage of stones !
User avatar
mk-soft
Always Here
Always Here
Posts: 5333
Joined: Fri May 12, 2006 6:51 pm
Location: Germany

Re: Save text file bug when source code is UTF-8 format

Post by mk-soft »

I cannot reproduce this under Ubuntu.
I'll have to try it out with Raspberry.

P.S.
Check your Preference -> Compiler -> Defaults -> Text encoding
My Projects ThreadToGUI / OOP-BaseClass / EventDesigner V3
PB v3.30 / v5.75 - OS Mac Mini OSX 10.xx - VM Window Pro / Linux Ubuntu
Downloads on my Webspace / OneDrive
juergenkulow
Enthusiast
Enthusiast
Posts: 544
Joined: Wed Sep 25, 2019 10:18 am

Re: Save text file bug when source code is UTF-8 format

Post by juergenkulow »

é is UTF-8 $c3 $a9.
But GEdit saves $C3 $83 $C2 $A9.

Code: Select all

; é saved with GEdit and PB-IDE
ShowMemoryViewer(?Label,?LabelEnd-?Label)
CallDebugger
DataSection
  Label:
  IncludeBinary "/tmp/savetext.pb"
  LabelEnd:
EndDataSection

; GEdit save 0000000000448DC0  74 68 C3 83 C2 A9 6F 72 69 65 20 65 6E 20 41 73  théorie
; PBIDE save 0000000000448ED0  74 68 C3 A9 6F 72 69 65 20 65 6E 20 41 73 63 69  théorie
Please ask your questions, because switch on the cognition apparatus decides on the only known life in the universe.Wersten :DDüsseldorf NRW Germany Europe Earth Solar System Flake Bubble Orionarm
Milky Way Local_Group Virgo Supercluster Laniakea Universe
User avatar
StarBootics
Addict
Addict
Posts: 984
Joined: Sun Jul 07, 2013 11:35 am
Location: Canada

Re: Save text file bug when source code is UTF-8 format

Post by StarBootics »

mk-soft wrote: Fri Nov 18, 2022 9:39 am Check your Preference -> Compiler -> Defaults -> Text encoding
With that setting set to UTF-8 encoding and the source file is also saved with UTF-8 encoding only the ASCII work properly both UTF-8 and UTF-16 as the "é" character wrongly encoded. The only way to have the expected result is when the source code is save as plain text : File menu -> File format -> Encoding Plain text is checked.

I didn't test with other accented characters but the problem will probably be the same.

This problem seems to be present for a very long time because I didn't remember that I have posted a similar bug here : viewtopic.php?t=77347

Best regards
StarBootics
The Stone Age did not end due to a shortage of stones !
Post Reply