[Done] Problem with lenght size and PeekS

Just starting out? Need help? Post your questions and find answers here.
johnstock
New User
New User
Posts: 8
Joined: Sat Aug 04, 2012 4:32 pm
Location: Italy
Contact:

[Done] Problem with lenght size and PeekS

Post by johnstock »

Hi, i'm encountering a strange problem with a specific lenght and PeekS command.
I'm tring to chunked read a file (a Mozilla Thunderbird container file of ~330Mbytes) but when the variable "Length" value is exactly 13488096 i get this error:
[18:30:39] [ERROR] testmemory.pb (Line: 54)
[18:30:39] [ERROR] Invalid memory access. (read error at address 71413760)
I tested it with many files, smaller and bigger, with no difference.
So i created this little peace of code to test with a difference of -1 and +1 of this "Lenght" and so no error there.

I noticed that:
- with Purebasic 6.10 LTS 32bit i got the error.
- with Purebasic 6.10 LTS 64bit this error does not apper.

Code: Select all

Enumeration
  #file0  
EndEnumeration

Declare i693()
Declare i694()
Declare i695()

Global Path$ = ""C:\Users\testuser\AppData\Roaming\Thunderbird\Profiles\niyfo4hw.default-release\Mail\Local Folders"
Global File$ = "Inbox"

Procedure i693()
  LocIniziale = 0
  LocFinale = 13488095
  
  If ReadFile(#file0, Path$ + File$, #PB_File_SharedRead)
    Length = LocFinale - LocIniziale
    
    If Length
      *MemoryID = AllocateMemory(Length)
      If *MemoryID
        FileSeek(#file0, LocIniziale) ;POSIZIONE CURSORE
        ReadData(#file0, *MemoryID, Length)
        
        Debug Length
        Debug PeekS(*MemoryID, Length, #PB_UTF8)
        
      EndIf
    EndIf
    
    CloseFile(#file0)
    FreeMemory(*MemoryID)
  EndIf
  
EndProcedure

Procedure i694()
  LocIniziale = 0
  LocFinale = 13488096
  
  
  If ReadFile(#file0, Path$ + File$, #PB_File_SharedRead)
    Length = LocFinale - LocIniziale
    
    If Length
      *MemoryID = AllocateMemory(Length)
      If *MemoryID
        FileSeek(#file0, LocIniziale) ;POSIZIONE CURSORE
        ReadData(#file0, *MemoryID, Length)
        
        Debug Length
        Debug PeekS(*MemoryID, Length, #PB_UTF8)
        
      EndIf
    EndIf
    
    CloseFile(#file0)
    FreeMemory(*MemoryID)
  EndIf
  
EndProcedure

Procedure i695()
  LocIniziale = 0
  LocFinale = 13488097
  
  
  If ReadFile(#file0, Path$ + File$, #PB_File_SharedRead)
    Length = LocFinale - LocIniziale
    
    If Length
      *MemoryID = AllocateMemory(Length)
      If *MemoryID
        FileSeek(#file0, LocIniziale) ;POSIZIONE CURSORE
        ReadData(#file0, *MemoryID, Length)
        
        Debug Length
        Debug PeekS(*MemoryID, Length, #PB_UTF8)
        
      EndIf
    EndIf
    
    CloseFile(#file0)
    FreeMemory(*MemoryID)
  EndIf
  
EndProcedure

i693()
i694()
i695()
I'm doing something wrong? I can't see anything so bad in the code :confused:.
Greetings
User avatar
Tenaja
Addict
Addict
Posts: 1959
Joined: Tue Nov 09, 2010 10:15 pm

Re: Problem with lenght size and PeekS

Post by Tenaja »

"length" in bytes, but UTF-8 can be multi byte? (Unless it's all ascii) That could give you an overflow.

Try checking the string length with this
https://www.purebasic.com/documentation ... ength.html
User avatar
STARGÅTE
Addict
Addict
Posts: 2226
Joined: Thu Jan 10, 2008 1:30 pm
Location: Germany, Glienicke
Contact:

Re: Problem with lenght size and PeekS

Post by STARGÅTE »

PeekS() expects a length in characters (and a character can be one byte (ASCII), two bytes (UNICODE) or multiple bytes (UTF8)).
If you want to pass the length in bytes, you have to use the flag #PB_ByteLength:

Code: Select all

PeekS(*Buffer, LengthInBytes, #PB_UTF8|#PB_ByteLength)
johnstock wrote: Sat May 18, 2024 5:50 pm I'm doing something wrong?
You posted a coding question the the bug report section :evil: .
Sorry, but this seems to be in vogue, currently.
PB 6.01 ― Win 10, 21H2 ― Ryzen 9 3900X, 32 GB ― NVIDIA GeForce RTX 3080 ― Vivaldi 6.0 ― www.unionbytes.de
Lizard - Script language for symbolic calculations and moreTypeface - Sprite-based font include/module
johnstock
New User
New User
Posts: 8
Joined: Sat Aug 04, 2012 4:32 pm
Location: Italy
Contact:

Re: Problem with lenght size and PeekS

Post by johnstock »

STARGÅTE wrote: Sat May 18, 2024 6:18 pm PeekS() expects a length in characters (and a character can be one byte (ASCII), two bytes (UNICODE) or multiple bytes (UTF8)).
If you want to pass the length in bytes, you have to use the flag #PB_ByteLength:

Code: Select all

PeekS(*Buffer, LengthInBytes, #PB_UTF8|#PB_ByteLength)
johnstock wrote: Sat May 18, 2024 5:50 pm I'm doing something wrong?
You posted a coding question the the bug report section :evil: .
Sorry, but this seems to be in vogue, currently.
Thanks for the explication, i'm tryng like that but the error is the same in the same place.

I've posted here on "bugs" because the exact same code works perfectly on 64bit version and i can't see anything obvious related to the bitness in the code.
I'm sorry if i've posted in the wrong section of the forum, but i was thinking being in good faith.
Last edited by johnstock on Sat May 18, 2024 10:06 pm, edited 1 time in total.
User avatar
STARGÅTE
Addict
Addict
Posts: 2226
Joined: Thu Jan 10, 2008 1:30 pm
Location: Germany, Glienicke
Contact:

Re: Problem with lenght size and PeekS

Post by STARGÅTE »

johnstock wrote: Sat May 18, 2024 7:24 pm Thanks for the explication, i'm tryng like that but the error is the same in the same place.
Can you split the respective line in two lines:

Code: Select all

Define TempString.s = PeekS(*MemoryID, Length, #PB_UTF8|#PB_ByteLength)
Debug TempString
At which line the error now occurs?
PB 6.01 ― Win 10, 21H2 ― Ryzen 9 3900X, 32 GB ― NVIDIA GeForce RTX 3080 ― Vivaldi 6.0 ― www.unionbytes.de
Lizard - Script language for symbolic calculations and moreTypeface - Sprite-based font include/module
johnstock
New User
New User
Posts: 8
Joined: Sat Aug 04, 2012 4:32 pm
Location: Italy
Contact:

Re: Problem with lenght size and PeekS

Post by johnstock »

Can you split the respective line in two lines:

Code: Select all

Define TempString.s = PeekS(*MemoryID, Length, #PB_UTF8|#PB_ByteLength)  <-- here
Debug TempString
At which line the error now occurs?
User avatar
STARGÅTE
Addict
Addict
Posts: 2226
Joined: Thu Jan 10, 2008 1:30 pm
Location: Germany, Glienicke
Contact:

Re: Problem with lenght size and PeekS

Post by STARGÅTE »

Ok. The problem is now, how we (the community) should reproduce this issue?

Of course, there might by a bug in PeekS() when reading a not well-formed UTF8 string.
But it would be good to reproduce it in a small snipped.
It seems like, PeekS() reads more than the Length.
PB 6.01 ― Win 10, 21H2 ― Ryzen 9 3900X, 32 GB ― NVIDIA GeForce RTX 3080 ― Vivaldi 6.0 ― www.unionbytes.de
Lizard - Script language for symbolic calculations and moreTypeface - Sprite-based font include/module
DarkDragon
Addict
Addict
Posts: 2344
Joined: Mon Jun 02, 2003 9:16 am
Location: Germany
Contact:

Re: Problem with lenght size and PeekS

Post by DarkDragon »

Uhm I guess the issue is that you cannot know whether you accidentally split a character or not. You need a PeekS with byte length. I'm not sure why PeekS wants the character length to be honest. You usually have a memory block with byte size and don't know the characters when reading data, don't you?

For utf-8 you can check from behind. If the last byte starts with 10 binary go back until you reach 11..
bye,
Daniel
juergenkulow
Enthusiast
Enthusiast
Posts: 581
Joined: Wed Sep 25, 2019 10:18 am

[Bug]Re: Problem with lenght size and PeekS

Post by juergenkulow »

Code: Select all

; x86 PeekS IMA with magic length $80FE0, $81FE0, $82FE0, ... , $CDCFEO=13488096, ...  
For Length=$81000 To 1 Step -16
  Debug Hex(Length) 
  *MemoryID = AllocateMemory(Length)
  For *p.Ascii=*MemoryID To *MemoryID+Length-1
    *p\a='A'
  Next 
  s.s=PeekS(*MemoryID, Length, #PB_UTF8)
  FreeMemory(*MemoryID)
Next 
; 81000
; 80FF0
; 80FE0
; [14:22:43] [ERROR] peeks.pb (Zeile: 8)
; [14:22:43] [ERROR] Ungültiger Speicherzugriff. (Lesefehler an der Adresse 5509120)
;  Ausnahmecode:	c0000005
;  Ausnahmeoffset:	00024b86
User avatar
STARGÅTE
Addict
Addict
Posts: 2226
Joined: Thu Jan 10, 2008 1:30 pm
Location: Germany, Glienicke
Contact:

Re: Problem with lenght size and PeekS

Post by STARGÅTE »

juergenkulow wrote: Sun May 19, 2024 1:35 pm

Code: Select all

; x86 PeekS IMA with magic length $80FE0, $81FE0, $82FE0, ... , $CDCFEO=13488096, ... 
Good root cause research juergenkulow, thank you.
So then I have to apologize to johnstock :oops: .
Now it seems really a bug in PureBasic, with PeekS() under x86.

A moderator should move it back to "Bug Reports", please.
PB 6.01 ― Win 10, 21H2 ― Ryzen 9 3900X, 32 GB ― NVIDIA GeForce RTX 3080 ― Vivaldi 6.0 ― www.unionbytes.de
Lizard - Script language for symbolic calculations and moreTypeface - Sprite-based font include/module
User avatar
mk-soft
Always Here
Always Here
Posts: 6202
Joined: Fri May 12, 2006 6:51 pm
Location: Germany

Re: Problem with lenght size and PeekS

Post by mk-soft »

I don't know why it tries to load such large files in a single string variable.
That makes no sense to me. It is always better if it is a text file to put it in a LinkedList. This makes it much easier to search through the individual lines and save them again after changing the line data.
My Projects ThreadToGUI / OOP-BaseClass / EventDesigner V3
PB v3.30 / v5.75 - OS Mac Mini OSX 10.xx - VM Window Pro / Linux Ubuntu
Downloads on my Webspace / OneDrive
johnstock
New User
New User
Posts: 8
Joined: Sat Aug 04, 2012 4:32 pm
Location: Italy
Contact:

Re: Problem with lenght size and PeekS

Post by johnstock »

mk-soft wrote: Sun May 19, 2024 2:07 pm I don't know why it tries to load such large files in a single string variable.
That makes no sense to me. It is always better if it is a text file to put it in a LinkedList. This makes it much easier to search through the individual lines and save them again after changing the line data.
Hi. i'm trying to create a mbox to eml.
So i have to read those big files (can be varius Gbytes), tryng to guess the start/end of every email (eml format) with regex and save that piece (here can be of Kb or Mb) to a local file and then send this file over a imap server.
Reading by rows those pieces of file can be a bottleneck, no?
Changing the line data can be problematic here (i think).
I'm not an expert and i like to be open to learn something everyday.
User avatar
mk-soft
Always Here
Always Here
Posts: 6202
Joined: Fri May 12, 2006 6:51 pm
Location: Germany

Re: Problem with lenght size and PeekS

Post by mk-soft »

Since the data is probably stored as ASCII (8bit), it is better to view and process it as raw data. PB strings are Unicode and the data must first be converted from ASCII to Unicode and then converted back to ASCII when stored.
Here it is better not to work with PB strings but with pointers to data and pointers to ArrayOfASCII.
It is a bit more complex to program, but with Purebasic you can handle raw data very well.
For quick searches of raw data there are some examples here in the forum

Simple ASCII find string:

Code: Select all


CompilerIf #PB_Compiler_OS = #PB_OS_Windows
  Import "shlwapi.lib"
    strstr(*string, *findstring) As "StrStrA"
    strcasestr(*string, *findstring) As "StrStrIA"
  EndImport
CompilerElse
  ImportC ""
    strstr(*string1, *string2)
    strcasestr(*string1, *string2)
  EndImport
CompilerEndIf

*data = Ascii("Hello World")
*word = Ascii("world")

*pos = strstr(*data, *word)
If *pos
  Debug "Offset = " + Str(*pos - *data)
  Debug PeekS(*pos, -1, #PB_Ascii)
Else
  Debug "No found case sensitive"
  
EndIf

*pos = strcasestr(*data, *word)
If *pos
  Debug "Offset = " + Str(*pos - *data)
  Debug PeekS(*pos, -1, #PB_Ascii)
Else
  Debug "No found"
  
EndIf

FreeMemory(*data)
FreeMemory(*word)

My Projects ThreadToGUI / OOP-BaseClass / EventDesigner V3
PB v3.30 / v5.75 - OS Mac Mini OSX 10.xx - VM Window Pro / Linux Ubuntu
Downloads on my Webspace / OneDrive
Fred
Administrator
Administrator
Posts: 18161
Joined: Fri May 17, 2002 4:39 pm
Location: France
Contact:

Re: [Done] Problem with lenght size and PeekS

Post by Fred »

Fixed.
Post Reply