Page 1 of 1

Ide - changing file format - Ide open source project

Posted: Sun Apr 12, 2020 12:16 pm
by Josh
As far as I remember, some Pb versions ago a bug was fixed that occurred when changing the file format from plain-text to Utf8 or vice versa. But there is still a bug with special characters. Set the file format to plain-text, insert the following code and change the file format to Utf8. You will see that the special characters in the range $80 - $9F are misinterpreted.

Code: Select all

A_00_1F_$ = "................................" ; Control characters replaced by dots
A_20_3F_$ = " !.#$%&'()*+,-./0123456789:;<=>?" ; DQ replaced by dot
A_40_5F_$ = "@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_"
A_60_7F_$ = "`abcdefghijklmnopqrstuvwxyz{|}~." ; DEL replaced by dot
A_80_9F_$ = "€.‚ƒ„…†‡ˆ‰Š‹Œ.Ž..‘’“”•–—.™š›œ.žŸ" ; Small tilde replaced by dot
A_A0_BF_$ = " ¡¢£¤¥¦§¨©ª«¬.®¯°±²³´µ¶·¸¹º»¼½¾¿" ; One character replaced by dot
A_C0_DF_$ = "ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖרÙÚÛÜÝÞß"
A_E0_FF_$ = "àáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ"

Debug A_00_1F_$
Debug A_20_3F_$
Debug A_40_5F_$
Debug A_60_7F_$
Debug A_80_9F_$
Debug A_A0_BF_$
Debug A_C0_DF_$
Debug A_E0_FF_$
To the specialists of the Ide open source project:

Unfortunately, I haven't been able to install the execution environment for the Ide yet, and I haven't been able to learn about Github. Can anyone do a test by replacing the procedure AsciiToUTF8() in the file SourceManagement.pb with the following code (including the structure ASCIIARRAY)

Code: Select all

Structure ASCIIARRAY
  a.a[0]
EndStructure

Procedure AsciiToUTF8(*out.ASCIIARRAY, *outlen.LONG, *in.ASCIIARRAY, *inlen.LONG)

  *in_end    = *in + *inlen\l   ; copy to local vars for access speed
  *out_start = *out

  While *in <= *in_end
    If     *in\a < $80  :  *out\a=*in\a                                :  *in+1 : *out+1
    ElseIf *in\a < $A0
      Select *in\a
        Case $80        :  *out\a=$E2 : *out\a[1]=$82 : *out\a[2]=$AC  :  *in+1 : *out+3 ; €
        Case $81        :  *out\a=$C2 : *out\a[1]=*in\a                :  *in+1 : *out+2 ; n.a.
        Case $82        :  *out\a=$E2 : *out\a[1]=$80 : *out\a[2]=$9A  :  *in+1 : *out+3 ; ‚
        Case $83        :  *out\a=$C6 : *out\a[1]=$92 :                :  *in+1 : *out+2 ; ƒ
        Case $84        :  *out\a=$E2 : *out\a[1]=$80 : *out\a[2]=$9E  :  *in+1 : *out+3 ; „
        Case $85        :  *out\a=$E2 : *out\a[1]=$80 : *out\a[2]=$A6  :  *in+1 : *out+3 ; …
        Case $86        :  *out\a=$E2 : *out\a[1]=$80 : *out\a[2]=$A0  :  *in+1 : *out+3 ; †
        Case $87        :  *out\a=$E2 : *out\a[1]=$80 : *out\a[2]=$A1  :  *in+1 : *out+3 ; ‡
        Case $88        :  *out\a=$CB : *out\a[1]=$86 :                :  *in+1 : *out+2 ; ˆ
        Case $89        :  *out\a=$E2 : *out\a[1]=$80 : *out\a[2]=$B0  :  *in+1 : *out+3 ; ‰
        Case $8A        :  *out\a=$C5 : *out\a[1]=$A0 :                :  *in+1 : *out+2 ; Š
        Case $8B        :  *out\a=$E2 : *out\a[1]=$80 : *out\a[2]=$B9  :  *in+1 : *out+3 ; ‹
        Case $8C        :  *out\a=$C5 : *out\a[1]=$92 :                :  *in+1 : *out+2 ; Œ
        Case $8D        :  *out\a=$C2 : *out\a[1]=*in\a                :  *in+1 : *out+2 ; n.a.
        Case $8E        :  *out\a=$C5 : *out\a[1]=$BD :                :  *in+1 : *out+2 ; Ž
        Case $8F        :  *out\a=$C2 : *out\a[1]=*in\a                :  *in+1 : *out+2 ; n.a.
        Case $90        :  *out\a=$C2 : *out\a[1]=*in\a                :  *in+1 : *out+2 ; n.a.
        Case $91        :  *out\a=$E2 : *out\a[1]=$80 : *out\a[2]=$98  :  *in+1 : *out+3 ; ‘
        Case $92        :  *out\a=$E2 : *out\a[1]=$80 : *out\a[2]=$99  :  *in+1 : *out+3 ; ’
        Case $93        :  *out\a=$E2 : *out\a[1]=$80 : *out\a[2]=$9C  :  *in+1 : *out+3 ; “
        Case $94        :  *out\a=$E2 : *out\a[1]=$80 : *out\a[2]=$9D  :  *in+1 : *out+3 ; ”
        Case $95        :  *out\a=$E2 : *out\a[1]=$80 : *out\a[2]=$A2  :  *in+1 : *out+3 ; •
        Case $96        :  *out\a=$E2 : *out\a[1]=$80 : *out\a[2]=$93  :  *in+1 : *out+3 ; –
        Case $97        :  *out\a=$E2 : *out\a[1]=$80 : *out\a[2]=$94  :  *in+1 : *out+3 ; —
        Case $98        :  *out\a=$CB : *out\a[1]=$9C :                :  *in+1 : *out+2 ; ˜
        Case $99        :  *out\a=$E2 : *out\a[1]=$84 : *out\a[2]=$A2  :  *in+1 : *out+3 ; ™
        Case $9A        :  *out\a=$C5 : *out\a[1]=$A1 :                :  *in+1 : *out+2 ; š
        Case $9B        :  *out\a=$E2 : *out\a[1]=$80 : *out\a[2]=$BE  :  *in+1 : *out+3 ; ›
        Case $9C        :  *out\a=$C5 : *out\a[1]=$93 :                :  *in+1 : *out+2 ; œ
        Case $9D        :  *out\a=$C2 : *out\a[1]=*in\a                :  *in+1 : *out+2 ; n.a.
        Case $9E        :  *out\a=$C5 : *out\a[1]=$BE :                :  *in+1 : *out+2 ; ž
        Case $9F        :  *out\a=$C5 : *out\a[1]=$B8 :                :  *in+1 : *out+2 ; Ÿ
      EndSelect
    ElseIf *in\a < $C0  :  *out\a=$C2 : *out\a[1]=*in\a                :  *in+1 : *out+2
    Else                :  *out\a=$C3 : *out\a[1]=*in\a - 64           :  *in+1 : *out+2
    EndIf
  Wend

  *outlen\l = *out - *out_start

EndProcedure

This works only in one direction. If this works, I will also have a look at the reverse functions.

Re: Ide - changing file format - Ide open source project

Posted: Sun Apr 12, 2020 12:55 pm
by User_Russian
Josh wrote:As far as I remember, some Pb versions ago a bug was fixed that occurred when changing the file format from plain-text to Utf8 or vice versa.
Is this a joke? Nothing is fixed.
Here is the text in UTF-8.

Image

If change the encoding to ASCII, this is what happens.

Image

Therefore, when changing the encoding, you have to copy there is text on the clipboard, change the encoding and paste the text into the editor.

Re: Ide - changing file format - Ide open source project

Posted: Sun Apr 12, 2020 1:18 pm
by Josh
In my first thread, I was speaking about converting from Ascii to Utf8. This should always be possible.

As I have just seen, the code table differs depending on the set language:
English codepage
German codepage

Let's give the German language a try first. If that works, we can see further.

Re: Ide - changing file format - Ide open source project

Posted: Sun Apr 12, 2020 2:30 pm
by User_Russian
Josh wrote:I was speaking about converting from Ascii to Utf8. This should always be possible.
Ascii

Image

In IDE I changed the encoding to UTF-8.
Result.

Image

Re: Ide - changing file format - Ide open source project

Posted: Sun Apr 12, 2020 5:50 pm
by kenmo
Disclaimer: I haven't investigated the IDE code yet...

But: I think this is the Scintilla behavior when you change the Scintilla's encoding in-place instead of changing the encoding then replacing the text.

For example, in Notepad++ (which has like 100 billion users :D ) the same thing occurs.
Create a new file with the encoding "ANSI" (what PB calls "plain text")
Paste in your example code.
Change the encoding to "UTF-8" and some characters become garbled.

The easy Notepad++ solution is Copy All, then change the encoding, then paste it all back in. The characters are correct and the encoding is really changed now.

The same workaround works in the PB IDE!


All that being said, I think both PB and Notepad++ should handle this automatically! Change the encoding, but also correct all characters. Internally I think this would be GetAllText --> SetEncoding --> SetAllText

Re: Ide - changing file format - Ide open source project

Posted: Sun Apr 12, 2020 7:16 pm
by Josh
kenmo wrote:For example, in Notepad++ (which has like 100 billion users :D ) the same thing occurs.
Create a new file with the encoding "ANSI" (what PB calls "plain text")
Paste in your example code.
Change the encoding to "UTF-8" and some characters become garbled.
No, in Notepad++ it works fine, no matter how often I convert the text back and forth. It always shows the correct code. In Notepad++ you have to use the lower menu items 'Convert to ...' in the 'Encoding' menu.

But maybe I thought too short. I used the Ascii code table 1252, which corresponds to the code table in the German Pb help.

Re: Ide - changing file format - Ide open source project

Posted: Sun Apr 12, 2020 7:24 pm
by gurj
CutAllText --> SetCodeFormat --> Paste

Re: Ide - changing file format - Ide open source project

Posted: Sun Apr 12, 2020 7:41 pm
by kenmo
Josh wrote:No, in Notepad++ it works fine, no matter how often I convert the text back and forth. It always shows the correct code. In Notepad++ you have to use the lower menu items 'Convert to ...' in the 'Encoding' menu.
Oops, I never paid attention to separate "Encode" and "Convert" actions! So "Convert" is probably doing exactly what I said, grab the text, change encoding, restore the text.

If you can confirm that copy > encoding > paste works in the PB IDE for you, I can implement that in the open source IDE and submit it.

Re: Ide - changing file format - Ide open source project

Posted: Mon Apr 13, 2020 12:55 pm
by User_Russian
kenmo wrote:If you can confirm that copy > encoding > paste works in the PB IDE
Yes it works.