Determining the encoding of a file using the BOM is classic. Determining the encoding from the content is a guessing algorithm. Developers could add a function like CheckDataEncoding(*p, length). But this function cannot be part of the ReadStringFormat() functionality in any way.simkot wrote: Wed Feb 26, 2025 7:11 am.
For example, in Autotit it is solved by one lineIn this case, ANSI is also defined inUTF 8 without BOM.Code: Select all
FileGetEncoding ( "filehandle/filename" [, mode = 1] )
Everything is logical, the source code should not be in ANSI format, this is already the last century. If your source code is opened in another country (ANSI), the texts in your native language will look like gibberish. And the translator will not be able to translate them into the language of another country. First, you will need a code page recognizer, an algorithm that determines the frequency of letters, or checks the existence of words in a dictionary. I'm not an expert in these algorithms, but the Russian language code page recognition engine in Notepad++ is faulty, it always gives the wrong result and you need to disable it so as not to break your files.simkot wrote: Wed Feb 26, 2025 7:11 am.
PureBasic is acting somehow illogically. It encodes its .pb files in UTF8 with BOM, but reads files without BOM. It is necessary to act somehow the same way.
I checked, UTF-8 without the BOM produces gibberish (opens as ANSI). ANSI opens as ANSI. So everything is working as it should be. There is no ANSI (cp1251) on Linux, so even in ANSI everything will be broken.