Page 1 of 2

What should ReadStringFormat(#File) return?

Posted: Tue Feb 25, 2025 6:50 pm
by simkot
The help says that ReadStringFormat(#File) should return strings. For example, #PB_UTF16BE. But when I check a file in PB_UTF16BE encoding, it returns 4. For a file in UTF8, BOM returns 2. For ANSI - 24. How is that possible?

Re: What should ReadStringFormat(#File) return?

Posted: Tue Feb 25, 2025 6:56 pm
by STARGÅTE
simkot wrote: Tue Feb 25, 2025 6:50 pmThe help says that ReadStringFormat(#File) should return strings.
ReadStringFormat()

In the documentation it is written:
Result = ReadStringFormat(#File)
So, the return value is a number.
#PB_UTF16BE is not a string, it is a constant, representing 4.

Re: What should ReadStringFormat(#File) return?

Posted: Tue Feb 25, 2025 7:14 pm
by simkot
How to find out the values ​​of other constants?
This function is kind of strange. It probably defines the encoding, but not the whole one. It doesn't define the most common UTF8 without BOM. So what's the point of it?? In other languages, such obvious and everyday functions are presented much more widely and work more clearly.

Re: What should ReadStringFormat(#File) return?

Posted: Tue Feb 25, 2025 7:28 pm
by STARGÅTE
The values ​​don't matter (and they could change from one PureBasic version to the next).
You always compare the returned value with the constants specified in the documentation.

Code: Select all

If ReadStringFormat(#File) = #PB_UTF8
	Debug "File is UTF8"
EndIf

Re: What should ReadStringFormat(#File) return?

Posted: Tue Feb 25, 2025 7:31 pm
by simkot
Thank you! Now it's clear. But how do you determine UTF8 without BOM? Can PureBasic do this?

Re: What should ReadStringFormat(#File) return?

Posted: Tue Feb 25, 2025 8:00 pm
by STARGÅTE
simkot wrote: Tue Feb 25, 2025 7:31 pm But how do you determine UTF8 without BOM? Can PureBasic do this?
Without BOM, ReadStringFormat() returns #PB_Ascii.
There is no safe method to distinguish between ASCII and UTF8 without reading the whole file and check if all characters are valid UTF8 encoded. Even then, it could be still normal ASCII.

Please note, PureBasic always tries to read the file as UTF8, if you omit the additional flag in ReadFile(#File, Filename$ [, Flags]) which specifies the behaviour of ReadString() etc., even if the file is not valid UTF8.

Re: What should ReadStringFormat(#File) return?

Posted: Tue Feb 25, 2025 8:36 pm
by AZJIO
simkot wrote: Tue Feb 25, 2025 7:31 pm But how do you determine UTF8 without BOM?
https://www.purebasic.fr/english/viewto ... 06#p636306
simkot wrote: Tue Feb 25, 2025 7:14 pm It doesn't define the most common UTF8 without BOM
How can he identify it if the BOM is missing? The whole essence of the definition is stored in the BOM.

Re: What should ReadStringFormat(#File) return?

Posted: Tue Feb 25, 2025 10:41 pm
by Piero
simkot wrote: Tue Feb 25, 2025 7:14 pmHow to find out the values ​​of other constants?
This is a job for the  RegExx0r!!!  :lol:

Re: What should ReadStringFormat(#File) return?

Posted: Tue Feb 25, 2025 11:17 pm
by HeX0R
simkot wrote: Tue Feb 25, 2025 7:14 pm How to find out the values ​​of other constants?
This function is kind of strange. It probably defines the encoding, but not the whole one. It doesn't define the most common UTF8 without BOM. So what's the point of it?? In other languages, such obvious and everyday functions are presented much more widely and work more clearly.
Interesting, which other language can detect a UTF8 text file without BOM by a quick look at it?
I guess you didn't understand that function.
If it had been named "ReadFileBOM()" (which is, what it does) it might have been clearer.
Piero wrote: Tue Feb 25, 2025 10:41 pm This is a job for the  RegExx0r!!!  :lol:
Dunno why, but somehow I'm feeling triggered :mrgreen:

Re: What should ReadStringFormat(#File) return?

Posted: Wed Feb 26, 2025 12:53 am
by Piero
HeX0R wrote: Tue Feb 25, 2025 11:17 pm
Piero wrote: Tue Feb 25, 2025 10:41 pm This is a job for the  RegExx0r!!!  :lol:
Dunno why, but somehow I'm feeling triggered :mrgreen:
Please don't: it's just for fun!

PS: I just "updated" it for n3wbi3z :mrgreen:

Re: What should ReadStringFormat(#File) return?

Posted: Wed Feb 26, 2025 1:12 am
by Demivec
simkot wrote: Tue Feb 25, 2025 7:31 pm Thank you! Now it's clear. But how do you determine UTF8 without BOM? Can PureBasic do this?
https://www.purebasic.fr/english/viewtopic.php?t=64385

Re: What should ReadStringFormat(#File) return?

Posted: Wed Feb 26, 2025 7:11 am
by simkot
STARGÅTE wrote: Tue Feb 25, 2025 8:00 pm There is no safe method to distinguish between ASCII and UTF8
But it is necessary to define.
HeX0R wrote: Tue Feb 25, 2025 11:17 pm Interesting, which other language can detect a UTF8 text file without BOM by a quick look at it?
For example, in Autotit it is solved by one line

Code: Select all

FileGetEncoding ( "filehandle/filename" [, mode = 1] )
In this case, ANSI is also defined inUTF 8 without BOM.
STARGÅTE wrote: Tue Feb 25, 2025 8:00 pm PureBasic always tries to read the file as UTF8
PureBasic is acting somehow illogically. It encodes its .pb files in UTF8 with BOM, but reads files without BOM. It is necessary to act somehow the same way.

Re: What should ReadStringFormat(#File) return?

Posted: Wed Feb 26, 2025 11:27 am
by Piero
Well… don't you have Well-Done Text Editors on windows?
TextMate (Mac) in case of doubts proposes you to select the encoding, with a preview of the resulting text, optionally "saving your choice" to "better guess next time"…
I mean, so you can popup a red msgbox saying: "wtf is this? please resave it with a BOM using [freeware link]!"

PS: Dear Wise Admins, I know; I'm getting repetitive about TextMate :oops:

Re: What should ReadStringFormat(#File) return?

Posted: Wed Feb 26, 2025 12:22 pm
by simkot
Writing a program to convert text that offers to convert it in another program... I don't think this is serious.

Re: What should ReadStringFormat(#File) return?

Posted: Wed Feb 26, 2025 12:47 pm
by Piero
simkot wrote: Wed Feb 26, 2025 12:22 pm Writing a program to convert text that offers to convert it in another program... I don't think this is serious.
I understand you, but I was talking in general (it's obvious it would be shameful if your progam is a text converter)
Anyway the best tradition always was to do better "shamelessly" using others' great (possibly opensource) stuff