Page 1 of 1
How to find out if a UTF-8 file is with BOM or not.
Posted: Sat Mar 11, 2023 6:17 pm
by Janni
Hi,
I want to check if a file is with BOM or not.
If BOM the first three hex chars will be EF - BB - BF
How to check that in PB? I found a way to find the first hex, but what about the second and third ?
Code: Select all
file.i = OpenFile(#PB_Any, "DemoWithBOM.txt")
signature.b = ReadByte(file)
CloseFile(file)
Debug Hex(signature, #PB_Byte) ; will display EF
Thanks in advance
Re: How to find out if a UTF-8 file is with BOM or not.
Posted: Sat Mar 11, 2023 6:27 pm
by HeX0R
Re: How to find out if a UTF-8 file is with BOM or not.
Posted: Sat Mar 11, 2023 6:57 pm
by Janni
ohh, there is a built in function

I totally missed that....
hmm to interpret the return values ? I got values like 2 (BOM) and 24 (not bom)
Re: How to find out if a UTF-8 file is with BOM or not.
Posted: Sat Mar 11, 2023 7:51 pm
by Demivec
Janni wrote: Sat Mar 11, 2023 6:57 pm
ohh, there is a built in function

I totally missed that....
hmm to interpret the return values ? I got values like 2 (BOM) and 24 (not bom)
Compare the return value to the value of the constants listed in the link to the documentation above under the heading 'Return Value'.

Re: How to find out if a UTF-8 file is with BOM or not.
Posted: Fri Jan 19, 2024 2:46 pm
by dige
Hi folks,
I have just noticed that UTF-8 files without BOM are recognised as ASCII.
Code: Select all
FileID = ReadFile(#PB_Any, file, #PB_File_SharedRead )
If FileID
FF = ReadStringFormat(FileID)
Debug FF
CloseFile(FileID)
EndIf
Notepadd++ or TotalCommander, for example, recognises the file type correctly. This means that there must be a way to recognise UTF-8 even without a BOM.
But how? Or is this a bug?
To test this, simply create a file with Notepad++ and save it as ASCII, UTF-8 and UTF-8 BOM.
ReadStringFormat() will recognise the Ascii and the UTF-8 file as #PB_Ascii and only UTF-8 BOM as UTF8-BOM.
Or just take the files from here:
https://u.pcloud.link/publink/show?code ... VItYmmPMry
Kind regards
Dige
Re: How to find out if a UTF-8 file is with BOM or not.
Posted: Fri Jan 19, 2024 5:43 pm
by Little John
This means that there must be a way to recognise UTF-8 even without a BOM.
AFAIR Demivec posted a procedure here that does do so.
Re: How to find out if a UTF-8 file is with BOM or not.
Posted: Sat Jan 20, 2024 12:02 am
by Demivec
Little John wrote: Fri Jan 19, 2024 5:43 pm
This means that there must be a way to recognise UTF-8 even without a BOM.
AFAIR Demivec posted a procedure here that does do so.
Here's the link:
https://www.purebasic.fr/english/viewtopic.php?p=479049&hilit=Text+file#p479049