Page 1 of 1

[Done] Problem with lenght size and PeekS

Posted: Sat May 18, 2024 5:50 pm
by johnstock
Hi, i'm encountering a strange problem with a specific lenght and PeekS command.
I'm tring to chunked read a file (a Mozilla Thunderbird container file of ~330Mbytes) but when the variable "Length" value is exactly 13488096 i get this error:
[18:30:39] [ERROR] testmemory.pb (Line: 54)
[18:30:39] [ERROR] Invalid memory access. (read error at address 71413760)
I tested it with many files, smaller and bigger, with no difference.
So i created this little peace of code to test with a difference of -1 and +1 of this "Lenght" and so no error there.

I noticed that:
- with Purebasic 6.10 LTS 32bit i got the error.
- with Purebasic 6.10 LTS 64bit this error does not apper.

Code: Select all

Enumeration
  #file0  
EndEnumeration

Declare i693()
Declare i694()
Declare i695()

Global Path$ = ""C:\Users\testuser\AppData\Roaming\Thunderbird\Profiles\niyfo4hw.default-release\Mail\Local Folders"
Global File$ = "Inbox"

Procedure i693()
  LocIniziale = 0
  LocFinale = 13488095
  
  If ReadFile(#file0, Path$ + File$, #PB_File_SharedRead)
    Length = LocFinale - LocIniziale
    
    If Length
      *MemoryID = AllocateMemory(Length)
      If *MemoryID
        FileSeek(#file0, LocIniziale) ;POSIZIONE CURSORE
        ReadData(#file0, *MemoryID, Length)
        
        Debug Length
        Debug PeekS(*MemoryID, Length, #PB_UTF8)
        
      EndIf
    EndIf
    
    CloseFile(#file0)
    FreeMemory(*MemoryID)
  EndIf
  
EndProcedure

Procedure i694()
  LocIniziale = 0
  LocFinale = 13488096
  
  
  If ReadFile(#file0, Path$ + File$, #PB_File_SharedRead)
    Length = LocFinale - LocIniziale
    
    If Length
      *MemoryID = AllocateMemory(Length)
      If *MemoryID
        FileSeek(#file0, LocIniziale) ;POSIZIONE CURSORE
        ReadData(#file0, *MemoryID, Length)
        
        Debug Length
        Debug PeekS(*MemoryID, Length, #PB_UTF8)
        
      EndIf
    EndIf
    
    CloseFile(#file0)
    FreeMemory(*MemoryID)
  EndIf
  
EndProcedure

Procedure i695()
  LocIniziale = 0
  LocFinale = 13488097
  
  
  If ReadFile(#file0, Path$ + File$, #PB_File_SharedRead)
    Length = LocFinale - LocIniziale
    
    If Length
      *MemoryID = AllocateMemory(Length)
      If *MemoryID
        FileSeek(#file0, LocIniziale) ;POSIZIONE CURSORE
        ReadData(#file0, *MemoryID, Length)
        
        Debug Length
        Debug PeekS(*MemoryID, Length, #PB_UTF8)
        
      EndIf
    EndIf
    
    CloseFile(#file0)
    FreeMemory(*MemoryID)
  EndIf
  
EndProcedure

i693()
i694()
i695()
I'm doing something wrong? I can't see anything so bad in the code :confused:.
Greetings

Re: Problem with lenght size and PeekS

Posted: Sat May 18, 2024 5:56 pm
by Tenaja
"length" in bytes, but UTF-8 can be multi byte? (Unless it's all ascii) That could give you an overflow.

Try checking the string length with this
https://www.purebasic.com/documentation ... ength.html

Re: Problem with lenght size and PeekS

Posted: Sat May 18, 2024 6:18 pm
by STARGÅTE
PeekS() expects a length in characters (and a character can be one byte (ASCII), two bytes (UNICODE) or multiple bytes (UTF8)).
If you want to pass the length in bytes, you have to use the flag #PB_ByteLength:

Code: Select all

PeekS(*Buffer, LengthInBytes, #PB_UTF8|#PB_ByteLength)
johnstock wrote: Sat May 18, 2024 5:50 pm I'm doing something wrong?
You posted a coding question the the bug report section :evil: .
Sorry, but this seems to be in vogue, currently.

Re: Problem with lenght size and PeekS

Posted: Sat May 18, 2024 7:24 pm
by johnstock
STARGÅTE wrote: Sat May 18, 2024 6:18 pm PeekS() expects a length in characters (and a character can be one byte (ASCII), two bytes (UNICODE) or multiple bytes (UTF8)).
If you want to pass the length in bytes, you have to use the flag #PB_ByteLength:

Code: Select all

PeekS(*Buffer, LengthInBytes, #PB_UTF8|#PB_ByteLength)
johnstock wrote: Sat May 18, 2024 5:50 pm I'm doing something wrong?
You posted a coding question the the bug report section :evil: .
Sorry, but this seems to be in vogue, currently.
Thanks for the explication, i'm tryng like that but the error is the same in the same place.

I've posted here on "bugs" because the exact same code works perfectly on 64bit version and i can't see anything obvious related to the bitness in the code.
I'm sorry if i've posted in the wrong section of the forum, but i was thinking being in good faith.

Re: Problem with lenght size and PeekS

Posted: Sat May 18, 2024 7:35 pm
by STARGÅTE
johnstock wrote: Sat May 18, 2024 7:24 pm Thanks for the explication, i'm tryng like that but the error is the same in the same place.
Can you split the respective line in two lines:

Code: Select all

Define TempString.s = PeekS(*MemoryID, Length, #PB_UTF8|#PB_ByteLength)
Debug TempString
At which line the error now occurs?

Re: Problem with lenght size and PeekS

Posted: Sat May 18, 2024 7:39 pm
by johnstock
Can you split the respective line in two lines:

Code: Select all

Define TempString.s = PeekS(*MemoryID, Length, #PB_UTF8|#PB_ByteLength)  <-- here
Debug TempString
At which line the error now occurs?

Re: Problem with lenght size and PeekS

Posted: Sat May 18, 2024 7:54 pm
by STARGÅTE
Ok. The problem is now, how we (the community) should reproduce this issue?

Of course, there might by a bug in PeekS() when reading a not well-formed UTF8 string.
But it would be good to reproduce it in a small snipped.
It seems like, PeekS() reads more than the Length.

Re: Problem with lenght size and PeekS

Posted: Sat May 18, 2024 8:07 pm
by DarkDragon
Uhm I guess the issue is that you cannot know whether you accidentally split a character or not. You need a PeekS with byte length. I'm not sure why PeekS wants the character length to be honest. You usually have a memory block with byte size and don't know the characters when reading data, don't you?

For utf-8 you can check from behind. If the last byte starts with 10 binary go back until you reach 11..

[Bug]Re: Problem with lenght size and PeekS

Posted: Sun May 19, 2024 1:35 pm
by juergenkulow

Code: Select all

; x86 PeekS IMA with magic length $80FE0, $81FE0, $82FE0, ... , $CDCFEO=13488096, ...  
For Length=$81000 To 1 Step -16
  Debug Hex(Length) 
  *MemoryID = AllocateMemory(Length)
  For *p.Ascii=*MemoryID To *MemoryID+Length-1
    *p\a='A'
  Next 
  s.s=PeekS(*MemoryID, Length, #PB_UTF8)
  FreeMemory(*MemoryID)
Next 
; 81000
; 80FF0
; 80FE0
; [14:22:43] [ERROR] peeks.pb (Zeile: 8)
; [14:22:43] [ERROR] Ungültiger Speicherzugriff. (Lesefehler an der Adresse 5509120)
;  Ausnahmecode:	c0000005
;  Ausnahmeoffset:	00024b86

Re: Problem with lenght size and PeekS

Posted: Sun May 19, 2024 1:59 pm
by STARGÅTE
juergenkulow wrote: Sun May 19, 2024 1:35 pm

Code: Select all

; x86 PeekS IMA with magic length $80FE0, $81FE0, $82FE0, ... , $CDCFEO=13488096, ... 
Good root cause research juergenkulow, thank you.
So then I have to apologize to johnstock :oops: .
Now it seems really a bug in PureBasic, with PeekS() under x86.

A moderator should move it back to "Bug Reports", please.

Re: Problem with lenght size and PeekS

Posted: Sun May 19, 2024 2:07 pm
by mk-soft
I don't know why it tries to load such large files in a single string variable.
That makes no sense to me. It is always better if it is a text file to put it in a LinkedList. This makes it much easier to search through the individual lines and save them again after changing the line data.

Re: Problem with lenght size and PeekS

Posted: Sun May 19, 2024 3:24 pm
by johnstock
mk-soft wrote: Sun May 19, 2024 2:07 pm I don't know why it tries to load such large files in a single string variable.
That makes no sense to me. It is always better if it is a text file to put it in a LinkedList. This makes it much easier to search through the individual lines and save them again after changing the line data.
Hi. i'm trying to create a mbox to eml.
So i have to read those big files (can be varius Gbytes), tryng to guess the start/end of every email (eml format) with regex and save that piece (here can be of Kb or Mb) to a local file and then send this file over a imap server.
Reading by rows those pieces of file can be a bottleneck, no?
Changing the line data can be problematic here (i think).
I'm not an expert and i like to be open to learn something everyday.

Re: Problem with lenght size and PeekS

Posted: Sun May 19, 2024 4:06 pm
by mk-soft
Since the data is probably stored as ASCII (8bit), it is better to view and process it as raw data. PB strings are Unicode and the data must first be converted from ASCII to Unicode and then converted back to ASCII when stored.
Here it is better not to work with PB strings but with pointers to data and pointers to ArrayOfASCII.
It is a bit more complex to program, but with Purebasic you can handle raw data very well.
For quick searches of raw data there are some examples here in the forum

Simple ASCII find string:

Code: Select all


CompilerIf #PB_Compiler_OS = #PB_OS_Windows
  Import "shlwapi.lib"
    strstr(*string, *findstring) As "StrStrA"
    strcasestr(*string, *findstring) As "StrStrIA"
  EndImport
CompilerElse
  ImportC ""
    strstr(*string1, *string2)
    strcasestr(*string1, *string2)
  EndImport
CompilerEndIf

*data = Ascii("Hello World")
*word = Ascii("world")

*pos = strstr(*data, *word)
If *pos
  Debug "Offset = " + Str(*pos - *data)
  Debug PeekS(*pos, -1, #PB_Ascii)
Else
  Debug "No found case sensitive"
  
EndIf

*pos = strcasestr(*data, *word)
If *pos
  Debug "Offset = " + Str(*pos - *data)
  Debug PeekS(*pos, -1, #PB_Ascii)
Else
  Debug "No found"
  
EndIf

FreeMemory(*data)
FreeMemory(*word)


Re: [Done] Problem with lenght size and PeekS

Posted: Thu Apr 03, 2025 4:38 pm
by Fred
Fixed.