Page 1 of 1
Gest string length problem
Posted: Sun Mar 10, 2024 12:45 pm
by le_magn
Hi all, i have a little problem to get correct string length, i open file with readfile and utf8 flag, readstring from file, if i try to get the string length with "Len" command, it give me wrong result, because it skip the special character on the string, also if i copy the string in clipboard it not copy the special chars,
but if i write string in other file with writestring command it are writed correct and with correct length, in example below the real length of string is 100 but it give me 96 as Len() result, please help me to understand why
Code: Select all
TFile.s="filetest.txt"
If ReadFile(0,tfile,#PB_UTF8)
String.s=ReadString(0)
Debug Len(String)
SetClipboardText(string)
Else
Debug "no file"
EndIf
file contain this string:

Re: Gest string length problem
Posted: Sun Mar 10, 2024 12:51 pm
by AZJIO
Code: Select all
Format = ReadStringFormat(id_file)
Be sure to use ReadStringformat() to move the pointer to the end of the BOM mark.
Re: Gest string length problem
Posted: Sun Mar 10, 2024 1:10 pm
by le_magn
AZJIO wrote: Sun Mar 10, 2024 12:51 pm
Code: Select all
Format = ReadStringFormat(id_file)
Be sure to use ReadStringformat() to move the pointer to the end of the BOM mark.
Thank's for the reply, if i use readstringforma() it give me 24, Ascii, if i set #pb_ascii len() retrieve the correct size, but if i write this string with writestring() command it writhe wrong string:
the correct string is write with #pb_utf8 format...
Re: Gest string length problem
Posted: Sun Mar 10, 2024 1:12 pm
by AZJIO
Code: Select all
WriteStringFormat(#File, #PB_UTF8)
Re: Gest string length problem
Posted: Sun Mar 10, 2024 1:27 pm
by le_magn
AZJIO wrote: Sun Mar 10, 2024 1:12 pm
Code: Select all
WriteStringFormat(#File, #PB_UTF8)
I tried various combination, but if i get right length it write bad string in file, please test yurself, write this string to file, with this string readed from file not direct from string;
Code: Select all
Three figures stand behind it –scientists, presumably– but you cannot make out their faces.</nr>
after you create the file pls check with hexeditor if string are correct
Code: Select all
Three figures stand behind it –scientists, presumably– but you cannot make out their faces.</nr>
ee figures stand behind it –scientists, presumably– but you cannot make out their faces.</nr>
ee figures stand behind it –scientists, presumably– but you cannot make out their faces.</nr>
1 string read in utf8 and write in utf8 but len get wrong size
2 string read in ascii an write in utf8, it read correct length but write bad string
3 string all in ascii and is also wrong

Re: Gest string length problem
Posted: Sun Mar 10, 2024 1:44 pm
by AZJIO
Code: Select all
Procedure.s ReadFileToVar(Path$)
Protected id_file, Format, Text$
id_file = ReadFile(#PB_Any, Path$)
If id_file
Format = ReadStringFormat(id_file)
Text$ = ReadString(id_file, Format | #PB_File_IgnoreEOL)
; Text$ = ReadString(id_file, #PB_UTF8 | #PB_File_IgnoreEOL)
CloseFile(id_file)
EndIf
ProcedureReturn Text$
EndProcedure
Procedure SaveVarToFile(Path$, Text$, Format = #PB_UTF8)
Protected Result = #False
Protected id_file = CreateFile(#PB_Any, Path$, Format)
If id_file
If WriteStringFormat(id_file, Format) And WriteString(id_file, Text$, Format)
Result = #True
EndIf
CloseFile(id_file)
EndIf
ProcedureReturn Result
EndProcedure
Path$ = GetTemporaryDirectory()
Text$ = ReadFileToVar(Path$ + "file.txt") ; The file should be UTF8 + BOM
Debug SaveVarToFile(Path$ + "NewFile.txt", Text$)
RunProgram(Path$ + "file.txt")
RunProgram(Path$ + "NewFile.txt")
Re: Gest string length problem
Posted: Sun Mar 10, 2024 1:54 pm
by le_magn
AZJIO wrote: Sun Mar 10, 2024 1:44 pm
tried your code, same result, written string are different from original

Re: Gest string length problem
Posted: Sun Mar 10, 2024 1:56 pm
by AZJIO
If there is no BOM mark, then you need to analyze the contents of the file to understand the format, otherwise it is read as ASCII. If UTF8 Read as ASCII, then you will have the wrong data.
https://www.purebasic.fr/english/viewtopic.php?p=478874
Re: Gest string length problem
Posted: Sun Mar 10, 2024 1:58 pm
by infratec
This works for me:
Code: Select all
TFile.s="filetest.txt"
If ReadFile(0, tfile, #PB_UTF8)
String.s = ReadString(0)
Debug Len(String)
Debug String
SetClipboardText(string)
CloseFile(0)
If CreateFile(0, "filetest2.txt", #PB_UTF8)
WriteStringN(0, String)
CloseFile(0)
EndIf
Else
Debug "no file"
EndIf
Re: Gest string length problem
Posted: Sun Mar 10, 2024 1:59 pm
by AZJIO
le_magn wrote: Sun Mar 10, 2024 1:54 pm
tried your code, same result, written string are different from original
Don't you see that in the first file you don't have a BOM mark? Your first file is wrong.
Re: Gest string length problem
Posted: Sun Mar 10, 2024 2:00 pm
by Sergey
I tested in UltraEdit string has 100 bytes and 96 symbols, because one symbol (E2 80 93) has 3 bytes length

Re: Gest string length problem
Posted: Sun Mar 10, 2024 2:01 pm
by AZJIO
Code: Select all
Debug SaveVarToFile(Path$ + "NewFile.txt", Text$, #PB_Ascii)
Use complex code
Code: Select all
EnableExplicit
Global g_Format
; https://www.purebasic.fr/english/viewtopic.php?p=478874
XIncludeFile "AutoDetectTextEncoding_Trim.pbi"
; Чтение файла в гаджет
Procedure.s OpenFileToGadget(FilePath$)
Protected length, oFile, bytes, *mem, Text$
oFile = ReadFile(#PB_Any, FilePath$)
If oFile
g_Format = ReadStringFormat(oFile)
length = Lof(oFile)
*mem = AllocateMemory(length)
If *mem
bytes = ReadData(oFile, *mem, length)
If bytes
If g_Format = #PB_Ascii
g_Format = dte::detectTextEncodingInBuffer(*mem, bytes, 0)
If g_Format = #PB_Ascii
Text$ = PeekS(*mem, bytes, #PB_Ascii)
Else
Text$ = PeekS(*mem, bytes, #PB_UTF8) ; если UTF8 без BOM
EndIf
Else
; тут не уверен, PeekS() поддерживает #PB_Unicode,
; а ReadStringFormat() может дать #PB_UTF16BE, #PB_UTF32, #PB_UTF32BE
; хотя эти форматы не популярны скорее не встретятся, и надо сделать на них игнор
Text$ = PeekS(*mem, bytes, g_Format)
EndIf
EndIf
FreeMemory(*mem)
EndIf
CloseFile(oFile)
EndIf
ProcedureReturn Text$
EndProcedure
Define FilePath$ = "C:\2023.01\1.txt"
Debug OpenFileToGadget(FilePath$)
Re: Gest string length problem
Posted: Sun Mar 10, 2024 2:05 pm
by le_magn
Yes!! you are right, thank you very much...