What should ReadStringFormat(#File) return?

Just starting out? Need help? Post your questions and find answers here.
simkot
User
User
Posts: 31
Joined: Sat Oct 26, 2024 8:25 am

What should ReadStringFormat(#File) return?

Post by simkot »

The help says that ReadStringFormat(#File) should return strings. For example, #PB_UTF16BE. But when I check a file in PB_UTF16BE encoding, it returns 4. For a file in UTF8, BOM returns 2. For ANSI - 24. How is that possible?
User avatar
STARGÅTE
Addict
Addict
Posts: 2226
Joined: Thu Jan 10, 2008 1:30 pm
Location: Germany, Glienicke
Contact:

Re: What should ReadStringFormat(#File) return?

Post by STARGÅTE »

simkot wrote: Tue Feb 25, 2025 6:50 pmThe help says that ReadStringFormat(#File) should return strings.
ReadStringFormat()

In the documentation it is written:
Result = ReadStringFormat(#File)
So, the return value is a number.
#PB_UTF16BE is not a string, it is a constant, representing 4.
PB 6.01 ― Win 10, 21H2 ― Ryzen 9 3900X, 32 GB ― NVIDIA GeForce RTX 3080 ― Vivaldi 6.0 ― www.unionbytes.de
Lizard - Script language for symbolic calculations and moreTypeface - Sprite-based font include/module
simkot
User
User
Posts: 31
Joined: Sat Oct 26, 2024 8:25 am

Re: What should ReadStringFormat(#File) return?

Post by simkot »

How to find out the values ​​of other constants?
This function is kind of strange. It probably defines the encoding, but not the whole one. It doesn't define the most common UTF8 without BOM. So what's the point of it?? In other languages, such obvious and everyday functions are presented much more widely and work more clearly.
Last edited by simkot on Tue Feb 25, 2025 7:28 pm, edited 1 time in total.
User avatar
STARGÅTE
Addict
Addict
Posts: 2226
Joined: Thu Jan 10, 2008 1:30 pm
Location: Germany, Glienicke
Contact:

Re: What should ReadStringFormat(#File) return?

Post by STARGÅTE »

The values ​​don't matter (and they could change from one PureBasic version to the next).
You always compare the returned value with the constants specified in the documentation.

Code: Select all

If ReadStringFormat(#File) = #PB_UTF8
	Debug "File is UTF8"
EndIf
PB 6.01 ― Win 10, 21H2 ― Ryzen 9 3900X, 32 GB ― NVIDIA GeForce RTX 3080 ― Vivaldi 6.0 ― www.unionbytes.de
Lizard - Script language for symbolic calculations and moreTypeface - Sprite-based font include/module
simkot
User
User
Posts: 31
Joined: Sat Oct 26, 2024 8:25 am

Re: What should ReadStringFormat(#File) return?

Post by simkot »

Thank you! Now it's clear. But how do you determine UTF8 without BOM? Can PureBasic do this?
User avatar
STARGÅTE
Addict
Addict
Posts: 2226
Joined: Thu Jan 10, 2008 1:30 pm
Location: Germany, Glienicke
Contact:

Re: What should ReadStringFormat(#File) return?

Post by STARGÅTE »

simkot wrote: Tue Feb 25, 2025 7:31 pm But how do you determine UTF8 without BOM? Can PureBasic do this?
Without BOM, ReadStringFormat() returns #PB_Ascii.
There is no safe method to distinguish between ASCII and UTF8 without reading the whole file and check if all characters are valid UTF8 encoded. Even then, it could be still normal ASCII.

Please note, PureBasic always tries to read the file as UTF8, if you omit the additional flag in ReadFile(#File, Filename$ [, Flags]) which specifies the behaviour of ReadString() etc., even if the file is not valid UTF8.
PB 6.01 ― Win 10, 21H2 ― Ryzen 9 3900X, 32 GB ― NVIDIA GeForce RTX 3080 ― Vivaldi 6.0 ― www.unionbytes.de
Lizard - Script language for symbolic calculations and moreTypeface - Sprite-based font include/module
AZJIO
Addict
Addict
Posts: 2141
Joined: Sun May 14, 2017 1:48 am

Re: What should ReadStringFormat(#File) return?

Post by AZJIO »

simkot wrote: Tue Feb 25, 2025 7:31 pm But how do you determine UTF8 without BOM?
https://www.purebasic.fr/english/viewto ... 06#p636306
simkot wrote: Tue Feb 25, 2025 7:14 pm It doesn't define the most common UTF8 without BOM
How can he identify it if the BOM is missing? The whole essence of the definition is stored in the BOM.
User avatar
Piero
Addict
Addict
Posts: 863
Joined: Sat Apr 29, 2023 6:04 pm
Location: Italy

Re: What should ReadStringFormat(#File) return?

Post by Piero »

simkot wrote: Tue Feb 25, 2025 7:14 pmHow to find out the values ​​of other constants?
This is a job for the  RegExx0r!!!  :lol:
User avatar
HeX0R
Addict
Addict
Posts: 1187
Joined: Mon Sep 20, 2004 7:12 am
Location: Hell

Re: What should ReadStringFormat(#File) return?

Post by HeX0R »

simkot wrote: Tue Feb 25, 2025 7:14 pm How to find out the values ​​of other constants?
This function is kind of strange. It probably defines the encoding, but not the whole one. It doesn't define the most common UTF8 without BOM. So what's the point of it?? In other languages, such obvious and everyday functions are presented much more widely and work more clearly.
Interesting, which other language can detect a UTF8 text file without BOM by a quick look at it?
I guess you didn't understand that function.
If it had been named "ReadFileBOM()" (which is, what it does) it might have been clearer.
Piero wrote: Tue Feb 25, 2025 10:41 pm This is a job for the  RegExx0r!!!  :lol:
Dunno why, but somehow I'm feeling triggered :mrgreen:
User avatar
Piero
Addict
Addict
Posts: 863
Joined: Sat Apr 29, 2023 6:04 pm
Location: Italy

Re: What should ReadStringFormat(#File) return?

Post by Piero »

HeX0R wrote: Tue Feb 25, 2025 11:17 pm
Piero wrote: Tue Feb 25, 2025 10:41 pm This is a job for the  RegExx0r!!!  :lol:
Dunno why, but somehow I'm feeling triggered :mrgreen:
Please don't: it's just for fun!

PS: I just "updated" it for n3wbi3z :mrgreen:
User avatar
Demivec
Addict
Addict
Posts: 4260
Joined: Mon Jul 25, 2005 3:51 pm
Location: Utah, USA

Re: What should ReadStringFormat(#File) return?

Post by Demivec »

simkot wrote: Tue Feb 25, 2025 7:31 pm Thank you! Now it's clear. But how do you determine UTF8 without BOM? Can PureBasic do this?
https://www.purebasic.fr/english/viewtopic.php?t=64385
simkot
User
User
Posts: 31
Joined: Sat Oct 26, 2024 8:25 am

Re: What should ReadStringFormat(#File) return?

Post by simkot »

STARGÅTE wrote: Tue Feb 25, 2025 8:00 pm There is no safe method to distinguish between ASCII and UTF8
But it is necessary to define.
HeX0R wrote: Tue Feb 25, 2025 11:17 pm Interesting, which other language can detect a UTF8 text file without BOM by a quick look at it?
For example, in Autotit it is solved by one line

Code: Select all

FileGetEncoding ( "filehandle/filename" [, mode = 1] )
In this case, ANSI is also defined inUTF 8 without BOM.
STARGÅTE wrote: Tue Feb 25, 2025 8:00 pm PureBasic always tries to read the file as UTF8
PureBasic is acting somehow illogically. It encodes its .pb files in UTF8 with BOM, but reads files without BOM. It is necessary to act somehow the same way.
User avatar
Piero
Addict
Addict
Posts: 863
Joined: Sat Apr 29, 2023 6:04 pm
Location: Italy

Re: What should ReadStringFormat(#File) return?

Post by Piero »

Well… don't you have Well-Done Text Editors on windows?
TextMate (Mac) in case of doubts proposes you to select the encoding, with a preview of the resulting text, optionally "saving your choice" to "better guess next time"…
I mean, so you can popup a red msgbox saying: "wtf is this? please resave it with a BOM using [freeware link]!"

PS: Dear Wise Admins, I know; I'm getting repetitive about TextMate :oops:
simkot
User
User
Posts: 31
Joined: Sat Oct 26, 2024 8:25 am

Re: What should ReadStringFormat(#File) return?

Post by simkot »

Writing a program to convert text that offers to convert it in another program... I don't think this is serious.
User avatar
Piero
Addict
Addict
Posts: 863
Joined: Sat Apr 29, 2023 6:04 pm
Location: Italy

Re: What should ReadStringFormat(#File) return?

Post by Piero »

simkot wrote: Wed Feb 26, 2025 12:22 pm Writing a program to convert text that offers to convert it in another program... I don't think this is serious.
I understand you, but I was talking in general (it's obvious it would be shameful if your progam is a text converter)
Anyway the best tradition always was to do better "shamelessly" using others' great (possibly opensource) stuff
Post Reply