Gest string length problem

Just starting out? Need help? Post your questions and find answers here.
User avatar
le_magn
Enthusiast
Enthusiast
Posts: 279
Joined: Wed Aug 24, 2005 12:11 pm
Location: Italia

Gest string length problem

Post by le_magn »

Hi all, i have a little problem to get correct string length, i open file with readfile and utf8 flag, readstring from file, if i try to get the string length with "Len" command, it give me wrong result, because it skip the special character on the string, also if i copy the string in clipboard it not copy the special chars, but if i write string in other file with writestring command it are writed correct and with correct length, in example below the real length of string is 100 but it give me 96 as Len() result, please help me to understand why

Code: Select all

TFile.s="filetest.txt"
If ReadFile(0,tfile,#PB_UTF8)
  String.s=ReadString(0)
  Debug Len(String)
  SetClipboardText(string)
Else
  Debug "no file"
EndIf
file contain this string:
Image
Image
AZJIO
Addict
Addict
Posts: 2154
Joined: Sun May 14, 2017 1:48 am

Re: Gest string length problem

Post by AZJIO »

Code: Select all

Format = ReadStringFormat(id_file)
Be sure to use ReadStringformat() to move the pointer to the end of the BOM mark.
User avatar
le_magn
Enthusiast
Enthusiast
Posts: 279
Joined: Wed Aug 24, 2005 12:11 pm
Location: Italia

Re: Gest string length problem

Post by le_magn »

AZJIO wrote: Sun Mar 10, 2024 12:51 pm

Code: Select all

Format = ReadStringFormat(id_file)
Be sure to use ReadStringformat() to move the pointer to the end of the BOM mark.
Thank's for the reply, if i use readstringforma() it give me 24, Ascii, if i set #pb_ascii len() retrieve the correct size, but if i write this string with writestring() command it writhe wrong string:
Image

the correct string is write with #pb_utf8 format...
Last edited by le_magn on Sun Mar 10, 2024 1:14 pm, edited 1 time in total.
Image
AZJIO
Addict
Addict
Posts: 2154
Joined: Sun May 14, 2017 1:48 am

Re: Gest string length problem

Post by AZJIO »

Code: Select all

WriteStringFormat(#File, #PB_UTF8)
User avatar
le_magn
Enthusiast
Enthusiast
Posts: 279
Joined: Wed Aug 24, 2005 12:11 pm
Location: Italia

Re: Gest string length problem

Post by le_magn »

AZJIO wrote: Sun Mar 10, 2024 1:12 pm

Code: Select all

WriteStringFormat(#File, #PB_UTF8)
I tried various combination, but if i get right length it write bad string in file, please test yurself, write this string to file, with this string readed from file not direct from string;

Code: Select all

Three figures stand behind it –scientists, presumably– but you cannot make out their faces.</nr>
after you create the file pls check with hexeditor if string are correct

Code: Select all

Three figures stand behind it –scientists, presumably– but you cannot make out their faces.</nr>
ee figures stand behind it –scientists, presumably– but you cannot make out their faces.</nr>
ee figures stand behind it –scientists, presumably– but you cannot make out their faces.</nr>
1 string read in utf8 and write in utf8 but len get wrong size
2 string read in ascii an write in utf8, it read correct length but write bad string
3 string all in ascii and is also wrong :(
Image
AZJIO
Addict
Addict
Posts: 2154
Joined: Sun May 14, 2017 1:48 am

Re: Gest string length problem

Post by AZJIO »

Code: Select all

Procedure.s ReadFileToVar(Path$)
	Protected id_file, Format, Text$

	id_file = ReadFile(#PB_Any, Path$)
	If id_file
		Format = ReadStringFormat(id_file)
		Text$ = ReadString(id_file, Format | #PB_File_IgnoreEOL)
		; 	Text$ = ReadString(id_file, #PB_UTF8 | #PB_File_IgnoreEOL)
		CloseFile(id_file)
	EndIf

	ProcedureReturn Text$
EndProcedure

Procedure SaveVarToFile(Path$, Text$, Format = #PB_UTF8)
	Protected Result = #False
	Protected id_file = CreateFile(#PB_Any, Path$, Format)
	If id_file
		If WriteStringFormat(id_file, Format) And WriteString(id_file, Text$, Format)
			Result = #True
		EndIf
		CloseFile(id_file)
	EndIf
	ProcedureReturn Result
EndProcedure

Path$ = GetTemporaryDirectory()
Text$ = ReadFileToVar(Path$ + "file.txt") ; The file should be UTF8 + BOM
Debug SaveVarToFile(Path$ + "NewFile.txt", Text$)
RunProgram(Path$ + "file.txt")
RunProgram(Path$ + "NewFile.txt")
User avatar
le_magn
Enthusiast
Enthusiast
Posts: 279
Joined: Wed Aug 24, 2005 12:11 pm
Location: Italia

Re: Gest string length problem

Post by le_magn »

AZJIO wrote: Sun Mar 10, 2024 1:44 pm

tried your code, same result, written string are different from original

Image
Image
AZJIO
Addict
Addict
Posts: 2154
Joined: Sun May 14, 2017 1:48 am

Re: Gest string length problem

Post by AZJIO »

If there is no BOM mark, then you need to analyze the contents of the file to understand the format, otherwise it is read as ASCII. If UTF8 Read as ASCII, then you will have the wrong data.
https://www.purebasic.fr/english/viewtopic.php?p=478874
infratec
Always Here
Always Here
Posts: 7598
Joined: Sun Sep 07, 2008 12:45 pm
Location: Germany

Re: Gest string length problem

Post by infratec »

This works for me:

Code: Select all

TFile.s="filetest.txt"
If ReadFile(0, tfile, #PB_UTF8)
  String.s = ReadString(0)
  Debug Len(String)
  Debug String
  SetClipboardText(string)
  CloseFile(0)
  
  If CreateFile(0, "filetest2.txt", #PB_UTF8)
    WriteStringN(0, String)
    CloseFile(0)
  EndIf
Else
  Debug "no file"
EndIf
AZJIO
Addict
Addict
Posts: 2154
Joined: Sun May 14, 2017 1:48 am

Re: Gest string length problem

Post by AZJIO »

le_magn wrote: Sun Mar 10, 2024 1:54 pm tried your code, same result, written string are different from original
Don't you see that in the first file you don't have a BOM mark? Your first file is wrong.
Sergey
User
User
Posts: 54
Joined: Wed Jan 12, 2022 2:41 pm

Re: Gest string length problem

Post by Sergey »

I tested in UltraEdit string has 100 bytes and 96 symbols, because one symbol (E2 80 93) has 3 bytes length

Image
AZJIO
Addict
Addict
Posts: 2154
Joined: Sun May 14, 2017 1:48 am

Re: Gest string length problem

Post by AZJIO »

Code: Select all

Debug SaveVarToFile(Path$ + "NewFile.txt", Text$, #PB_Ascii)
Use complex code

Code: Select all

EnableExplicit
 
Global g_Format
 
; https://www.purebasic.fr/english/viewtopic.php?p=478874
XIncludeFile "AutoDetectTextEncoding_Trim.pbi"
 
; Чтение файла в гаджет
Procedure.s OpenFileToGadget(FilePath$)
    Protected length, oFile, bytes, *mem, Text$
    oFile = ReadFile(#PB_Any, FilePath$)
    If oFile
        g_Format = ReadStringFormat(oFile)
        length = Lof(oFile)
        *mem = AllocateMemory(length)
        If *mem
            bytes = ReadData(oFile, *mem, length)
            If bytes
                If g_Format = #PB_Ascii
                    g_Format = dte::detectTextEncodingInBuffer(*mem, bytes, 0)
                    If g_Format = #PB_Ascii
                        Text$ = PeekS(*mem, bytes, #PB_Ascii)
                    Else
                        Text$ = PeekS(*mem, bytes, #PB_UTF8) ; если UTF8 без BOM
                    EndIf
                Else
                    ; тут не уверен, PeekS() поддерживает #PB_Unicode,
                    ; а ReadStringFormat() может дать #PB_UTF16BE, #PB_UTF32, #PB_UTF32BE
                    ; хотя эти форматы не популярны скорее не встретятся, и надо сделать на них игнор
                    Text$ = PeekS(*mem, bytes, g_Format)
                EndIf
            EndIf
                FreeMemory(*mem)
        EndIf
        CloseFile(oFile)
    EndIf
    ProcedureReturn Text$
EndProcedure
 
Define FilePath$ = "C:\2023.01\1.txt"
Debug OpenFileToGadget(FilePath$)
User avatar
le_magn
Enthusiast
Enthusiast
Posts: 279
Joined: Wed Aug 24, 2005 12:11 pm
Location: Italia

Re: Gest string length problem

Post by le_magn »

Yes!! you are right, thank you very much...
Image
Post Reply