Page 1 of 1

Gest string length problem

Posted: Sun Mar 10, 2024 12:45 pm
by le_magn
Hi all, i have a little problem to get correct string length, i open file with readfile and utf8 flag, readstring from file, if i try to get the string length with "Len" command, it give me wrong result, because it skip the special character on the string, also if i copy the string in clipboard it not copy the special chars, but if i write string in other file with writestring command it are writed correct and with correct length, in example below the real length of string is 100 but it give me 96 as Len() result, please help me to understand why

Code: Select all

TFile.s="filetest.txt"
If ReadFile(0,tfile,#PB_UTF8)
  String.s=ReadString(0)
  Debug Len(String)
  SetClipboardText(string)
Else
  Debug "no file"
EndIf
file contain this string:
Image

Re: Gest string length problem

Posted: Sun Mar 10, 2024 12:51 pm
by AZJIO

Code: Select all

Format = ReadStringFormat(id_file)
Be sure to use ReadStringformat() to move the pointer to the end of the BOM mark.

Re: Gest string length problem

Posted: Sun Mar 10, 2024 1:10 pm
by le_magn
AZJIO wrote: Sun Mar 10, 2024 12:51 pm

Code: Select all

Format = ReadStringFormat(id_file)
Be sure to use ReadStringformat() to move the pointer to the end of the BOM mark.
Thank's for the reply, if i use readstringforma() it give me 24, Ascii, if i set #pb_ascii len() retrieve the correct size, but if i write this string with writestring() command it writhe wrong string:
Image

the correct string is write with #pb_utf8 format...

Re: Gest string length problem

Posted: Sun Mar 10, 2024 1:12 pm
by AZJIO

Code: Select all

WriteStringFormat(#File, #PB_UTF8)

Re: Gest string length problem

Posted: Sun Mar 10, 2024 1:27 pm
by le_magn
AZJIO wrote: Sun Mar 10, 2024 1:12 pm

Code: Select all

WriteStringFormat(#File, #PB_UTF8)
I tried various combination, but if i get right length it write bad string in file, please test yurself, write this string to file, with this string readed from file not direct from string;

Code: Select all

Three figures stand behind it –scientists, presumably– but you cannot make out their faces.</nr>
after you create the file pls check with hexeditor if string are correct

Code: Select all

Three figures stand behind it –scientists, presumably– but you cannot make out their faces.</nr>
ee figures stand behind it –scientists, presumably– but you cannot make out their faces.</nr>
ee figures stand behind it –scientists, presumably– but you cannot make out their faces.</nr>
1 string read in utf8 and write in utf8 but len get wrong size
2 string read in ascii an write in utf8, it read correct length but write bad string
3 string all in ascii and is also wrong :(

Re: Gest string length problem

Posted: Sun Mar 10, 2024 1:44 pm
by AZJIO

Code: Select all

Procedure.s ReadFileToVar(Path$)
	Protected id_file, Format, Text$

	id_file = ReadFile(#PB_Any, Path$)
	If id_file
		Format = ReadStringFormat(id_file)
		Text$ = ReadString(id_file, Format | #PB_File_IgnoreEOL)
		; 	Text$ = ReadString(id_file, #PB_UTF8 | #PB_File_IgnoreEOL)
		CloseFile(id_file)
	EndIf

	ProcedureReturn Text$
EndProcedure

Procedure SaveVarToFile(Path$, Text$, Format = #PB_UTF8)
	Protected Result = #False
	Protected id_file = CreateFile(#PB_Any, Path$, Format)
	If id_file
		If WriteStringFormat(id_file, Format) And WriteString(id_file, Text$, Format)
			Result = #True
		EndIf
		CloseFile(id_file)
	EndIf
	ProcedureReturn Result
EndProcedure

Path$ = GetTemporaryDirectory()
Text$ = ReadFileToVar(Path$ + "file.txt") ; The file should be UTF8 + BOM
Debug SaveVarToFile(Path$ + "NewFile.txt", Text$)
RunProgram(Path$ + "file.txt")
RunProgram(Path$ + "NewFile.txt")

Re: Gest string length problem

Posted: Sun Mar 10, 2024 1:54 pm
by le_magn
AZJIO wrote: Sun Mar 10, 2024 1:44 pm

tried your code, same result, written string are different from original

Image

Re: Gest string length problem

Posted: Sun Mar 10, 2024 1:56 pm
by AZJIO
If there is no BOM mark, then you need to analyze the contents of the file to understand the format, otherwise it is read as ASCII. If UTF8 Read as ASCII, then you will have the wrong data.
https://www.purebasic.fr/english/viewtopic.php?p=478874

Re: Gest string length problem

Posted: Sun Mar 10, 2024 1:58 pm
by infratec
This works for me:

Code: Select all

TFile.s="filetest.txt"
If ReadFile(0, tfile, #PB_UTF8)
  String.s = ReadString(0)
  Debug Len(String)
  Debug String
  SetClipboardText(string)
  CloseFile(0)
  
  If CreateFile(0, "filetest2.txt", #PB_UTF8)
    WriteStringN(0, String)
    CloseFile(0)
  EndIf
Else
  Debug "no file"
EndIf

Re: Gest string length problem

Posted: Sun Mar 10, 2024 1:59 pm
by AZJIO
le_magn wrote: Sun Mar 10, 2024 1:54 pm tried your code, same result, written string are different from original
Don't you see that in the first file you don't have a BOM mark? Your first file is wrong.

Re: Gest string length problem

Posted: Sun Mar 10, 2024 2:00 pm
by Sergey
I tested in UltraEdit string has 100 bytes and 96 symbols, because one symbol (E2 80 93) has 3 bytes length

Image

Re: Gest string length problem

Posted: Sun Mar 10, 2024 2:01 pm
by AZJIO

Code: Select all

Debug SaveVarToFile(Path$ + "NewFile.txt", Text$, #PB_Ascii)
Use complex code

Code: Select all

EnableExplicit
 
Global g_Format
 
; https://www.purebasic.fr/english/viewtopic.php?p=478874
XIncludeFile "AutoDetectTextEncoding_Trim.pbi"
 
; Чтение файла в гаджет
Procedure.s OpenFileToGadget(FilePath$)
    Protected length, oFile, bytes, *mem, Text$
    oFile = ReadFile(#PB_Any, FilePath$)
    If oFile
        g_Format = ReadStringFormat(oFile)
        length = Lof(oFile)
        *mem = AllocateMemory(length)
        If *mem
            bytes = ReadData(oFile, *mem, length)
            If bytes
                If g_Format = #PB_Ascii
                    g_Format = dte::detectTextEncodingInBuffer(*mem, bytes, 0)
                    If g_Format = #PB_Ascii
                        Text$ = PeekS(*mem, bytes, #PB_Ascii)
                    Else
                        Text$ = PeekS(*mem, bytes, #PB_UTF8) ; если UTF8 без BOM
                    EndIf
                Else
                    ; тут не уверен, PeekS() поддерживает #PB_Unicode,
                    ; а ReadStringFormat() может дать #PB_UTF16BE, #PB_UTF32, #PB_UTF32BE
                    ; хотя эти форматы не популярны скорее не встретятся, и надо сделать на них игнор
                    Text$ = PeekS(*mem, bytes, g_Format)
                EndIf
            EndIf
                FreeMemory(*mem)
        EndIf
        CloseFile(oFile)
    EndIf
    ProcedureReturn Text$
EndProcedure
 
Define FilePath$ = "C:\2023.01\1.txt"
Debug OpenFileToGadget(FilePath$)

Re: Gest string length problem

Posted: Sun Mar 10, 2024 2:05 pm
by le_magn
Yes!! you are right, thank you very much...