What should ReadStringFormat(#File) return?
What should ReadStringFormat(#File) return?
The help says that ReadStringFormat(#File) should return strings. For example, #PB_UTF16BE. But when I check a file in PB_UTF16BE encoding, it returns 4. For a file in UTF8, BOM returns 2. For ANSI - 24. How is that possible?
Re: What should ReadStringFormat(#File) return?
ReadStringFormat()simkot wrote: Tue Feb 25, 2025 6:50 pmThe help says that ReadStringFormat(#File) should return strings.
In the documentation it is written:
So, the return value is a number.Result = ReadStringFormat(#File)
#PB_UTF16BE is not a string, it is a constant, representing 4.
PB 6.01 ― Win 10, 21H2 ― Ryzen 9 3900X, 32 GB ― NVIDIA GeForce RTX 3080 ― Vivaldi 6.0 ― www.unionbytes.de
Lizard - Script language for symbolic calculations and more ― Typeface - Sprite-based font include/module
Lizard - Script language for symbolic calculations and more ― Typeface - Sprite-based font include/module
Re: What should ReadStringFormat(#File) return?
How to find out the values of other constants?
This function is kind of strange. It probably defines the encoding, but not the whole one. It doesn't define the most common UTF8 without BOM. So what's the point of it?? In other languages, such obvious and everyday functions are presented much more widely and work more clearly.
This function is kind of strange. It probably defines the encoding, but not the whole one. It doesn't define the most common UTF8 without BOM. So what's the point of it?? In other languages, such obvious and everyday functions are presented much more widely and work more clearly.
Last edited by simkot on Tue Feb 25, 2025 7:28 pm, edited 1 time in total.
Re: What should ReadStringFormat(#File) return?
The values don't matter (and they could change from one PureBasic version to the next).
You always compare the returned value with the constants specified in the documentation.
You always compare the returned value with the constants specified in the documentation.
Code: Select all
If ReadStringFormat(#File) = #PB_UTF8
Debug "File is UTF8"
EndIf
PB 6.01 ― Win 10, 21H2 ― Ryzen 9 3900X, 32 GB ― NVIDIA GeForce RTX 3080 ― Vivaldi 6.0 ― www.unionbytes.de
Lizard - Script language for symbolic calculations and more ― Typeface - Sprite-based font include/module
Lizard - Script language for symbolic calculations and more ― Typeface - Sprite-based font include/module
Re: What should ReadStringFormat(#File) return?
Thank you! Now it's clear. But how do you determine UTF8 without BOM? Can PureBasic do this?
Re: What should ReadStringFormat(#File) return?
Without BOM, ReadStringFormat() returns #PB_Ascii.simkot wrote: Tue Feb 25, 2025 7:31 pm But how do you determine UTF8 without BOM? Can PureBasic do this?
There is no safe method to distinguish between ASCII and UTF8 without reading the whole file and check if all characters are valid UTF8 encoded. Even then, it could be still normal ASCII.
Please note, PureBasic always tries to read the file as UTF8, if you omit the additional flag in ReadFile(#File, Filename$ [, Flags]) which specifies the behaviour of ReadString() etc., even if the file is not valid UTF8.
PB 6.01 ― Win 10, 21H2 ― Ryzen 9 3900X, 32 GB ― NVIDIA GeForce RTX 3080 ― Vivaldi 6.0 ― www.unionbytes.de
Lizard - Script language for symbolic calculations and more ― Typeface - Sprite-based font include/module
Lizard - Script language for symbolic calculations and more ― Typeface - Sprite-based font include/module
Re: What should ReadStringFormat(#File) return?
https://www.purebasic.fr/english/viewto ... 06#p636306
How can he identify it if the BOM is missing? The whole essence of the definition is stored in the BOM.
Re: What should ReadStringFormat(#File) return?
Re: What should ReadStringFormat(#File) return?
Interesting, which other language can detect a UTF8 text file without BOM by a quick look at it?simkot wrote: Tue Feb 25, 2025 7:14 pm How to find out the values of other constants?
This function is kind of strange. It probably defines the encoding, but not the whole one. It doesn't define the most common UTF8 without BOM. So what's the point of it?? In other languages, such obvious and everyday functions are presented much more widely and work more clearly.
I guess you didn't understand that function.
If it had been named "ReadFileBOM()" (which is, what it does) it might have been clearer.
Dunno why, but somehow I'm feeling triggered

{Home}.:|:.{Dialog Design0R}.:|:.{Codes}.:|:.{History Viewer Online}.:|:.{Send a Beer}
Re: What should ReadStringFormat(#File) return?
Re: What should ReadStringFormat(#File) return?
https://www.purebasic.fr/english/viewtopic.php?t=64385simkot wrote: Tue Feb 25, 2025 7:31 pm Thank you! Now it's clear. But how do you determine UTF8 without BOM? Can PureBasic do this?
Re: What should ReadStringFormat(#File) return?
But it is necessary to define.STARGÅTE wrote: Tue Feb 25, 2025 8:00 pm There is no safe method to distinguish between ASCII and UTF8
For example, in Autotit it is solved by one lineHeX0R wrote: Tue Feb 25, 2025 11:17 pm Interesting, which other language can detect a UTF8 text file without BOM by a quick look at it?
Code: Select all
FileGetEncoding ( "filehandle/filename" [, mode = 1] )
PureBasic is acting somehow illogically. It encodes its .pb files in UTF8 with BOM, but reads files without BOM. It is necessary to act somehow the same way.
Re: What should ReadStringFormat(#File) return?
Well… don't you have Well-Done Text Editors on windows?
TextMate (Mac) in case of doubts proposes you to select the encoding, with a preview of the resulting text, optionally "saving your choice" to "better guess next time"…
I mean, so you can popup a red msgbox saying: "wtf is this? please resave it with a BOM using [freeware link]!"
PS: Dear Wise Admins, I know; I'm getting repetitive about TextMate
TextMate (Mac) in case of doubts proposes you to select the encoding, with a preview of the resulting text, optionally "saving your choice" to "better guess next time"…
I mean, so you can popup a red msgbox saying: "wtf is this? please resave it with a BOM using [freeware link]!"
PS: Dear Wise Admins, I know; I'm getting repetitive about TextMate

Re: What should ReadStringFormat(#File) return?
Writing a program to convert text that offers to convert it in another program... I don't think this is serious.
Re: What should ReadStringFormat(#File) return?
I understand you, but I was talking in general (it's obvious it would be shameful if your progam is a text converter)simkot wrote: Wed Feb 26, 2025 12:22 pm Writing a program to convert text that offers to convert it in another program... I don't think this is serious.
Anyway the best tradition always was to do better "shamelessly" using others' great (possibly opensource) stuff