pb6.12b3 Not sure if it is a peeks bug?

Just starting out? Need help? Post your questions and find answers here.
gltianya
User
User
Posts: 23
Joined: Wed Jul 07, 2021 12:02 pm

pb6.12b3 Not sure if it is a peeks bug?

Post by gltianya »

Not sure if it is a peeks bug, please check the following code. Code 1 has an error in the peeks function when accessing a file.
thanks!
line: filetext$ = PeekS(*UnpackID, Result, #PB_UTF8)
https://drive.google.com/file/d/1sLYzrA ... sp=sharing

Code: Select all

; code 1
#FixedRegularStr = "^\s*SIGNED\s*$"
#KeyWordsNotFound = "keywords not found"
Global resultstr.s = ""

Procedure.s checkstring(sstr.s , regularstr.s = #FixedRegularStr)
  Protected str1.s = #KeyWordsNotFound
  Protected str2.s = ""
  Protected stemp.s = ""

  Protected i.i , j.i = CountString(sstr,Chr(13))
  If j > 0
    For i = 1 To j
      stemp = StringField(sstr,i,Chr(13))
      If Len(stemp) = 0
        Continue
      Else
        If CreateRegularExpression(0, regularstr)
          
          If ExamineRegularExpression(0, stemp)
            While NextRegularExpressionMatch(0)
              str2 = RegularExpressionMatchString(0)
              Break
            Wend  
          EndIf
        EndIf
        Break
      EndIf
    Next
  Else
    If CreateRegularExpression(0, regularstr)
      
      If ExamineRegularExpression(0, sstr)
        While NextRegularExpressionMatch(0)
          str2 = RegularExpressionMatchString(0)
          Break
        Wend  
      EndIf
    EndIf
  EndIf
  If str2 <> ""                                                                ; If the match is successful, str2 is not empty
    str1 = str2
  EndIf 
  
  ProcedureReturn str1
EndProcedure

Procedure.s parsedocxxml(*xmlstr)
  Protected firstlinestr.s = ""
  ProcedureReturn firstlinestr
EndProcedure

Procedure.s checkdocxkeywords(filename.s , *pmem = #Null, filesize = 0, regularstr.s = #FixedRegularStr)
  Protected firstlinestr.s = "" , filetext$ = ""
  Protected *UnpackID
  Protected Result.i
  
  Debug "filename.s =" + filename
  If filename = "H:\mylib\GemBox.Document.Examples-master\C#\Advanced Features\Progress Reporting and Cancellation\Cancellation in WPF\LargeDocument.docx"
    Debug "break by bug"
  EndIf
  
  ;OnErrorCall(@ErrorHandler())
  resultstr = ""
  UseZipPacker()
  If filename <> ""
    result = OpenPack(11, filename, #PB_PackerPlugin_Zip)                       ; List all the entries
  ElseIf (*pmem <> #Null And filesize > 0 )
    Result = CatchPack(11, *pmem, filesize , #PB_PackerPlugin_Zip)
  EndIf  
  If Result > 0
    If ExaminePack(11)      
      While NextPackEntry(11)
        If PackEntryName(11) = "word/document.xml"
          Debug "PackEntrySize(11) =" + Str(PackEntrySize(11))
          
          *UnpackID = AllocateMemory(PackEntrySize(11))
          If *UnpackID = 0
            ClosePack(11)
            ProcedureReturn resultstr
          EndIf  
          Result = UncompressPackMemory(11, *UnpackID, PackEntrySize(11),"word/document.xml")
          Debug "Result =" + Str(Result)
          If result > 0 
            Debug "*UnpackID=" + Str(*UnpackID)
            filetext$ = PeekS(*UnpackID, Result, #PB_UTF8)
            
            
            
            If Len(firstlinestr) > 0
              resultstr = checkstring(firstlinestr , regularstr)
            EndIf
            Break                                                             
          EndIf
          FreeMemory(*UnpackID)
        EndIf
      Wend    
    EndIf        
    ClosePack(11)  
  EndIf
  ProcedureReturn resultstr
EndProcedure

Procedure.i existkeywords_docx(filename.s , regularstr.s = #FixedRegularStr)
  Protected existkeywords.i = 1 , checkresult.s
  
  If regularstr = ""
    regularstr = #FixedRegularStr
  EndIf 
  
  checkresult = checkdocxkeywords(filename , #Null, 0 , #FixedRegularStr)
  
  If checkresult = #KeyWordsNotFound
    existkeywords.i = 0
  ElseIf checkresult = ""
    existkeywords.i = -1
  EndIf
  
  ProcedureReturn existkeywords
EndProcedure

Procedure.i existkeywords_file(filename.s , regularstr.s); = #FixedRegularStr)

  Protected existkeywords.i = 1, filenameext$ = UCase(GetExtensionPart(filename))
  
  If regularstr = ""
    regularstr = #FixedRegularStr
  EndIf 
  
  Select filenameext$
      
    Case "DOCX"
      existkeywords = existkeywords_docx(filename , #FixedRegularStr)
      
    Default
      existkeywords = - 1
  EndSelect
  
  ProcedureReturn existkeywords
EndProcedure

;-> main
;
Global NewList filelist.s()
Global NewList filelist_checked.s()

If OpenConsole()
  EnableGraphicalConsole(1)
  ClearConsole()
  If OpenFile(0, "AllResults-v2.txt",#PB_File_SharedRead | #PB_UTF8)
    While Not Eof(0)
      AddElement(filelist())
      filelist() = ReadString(0,#PB_UTF8)
      ;Debug filelist()
    Wend
  EndIf
  ResetList(filelist())
  Define.i filetotal = ListSize(filelist()), i = 1 , j = 0
  Define.s filenamestr , result = ""
  ForEach filelist()
    ConsoleLocate(2, 2)

    j = existkeywords_file(filelist(),"")
    If j = 1
      result = "warn 1"
    ElseIf j = 0
      result = "warn 2"
    ElseIf j = -1
      result = "warn 3"
    EndIf
    
    AddElement(filelist_checked())
    filelist_checked() = filelist() + ";" + result
    ConsoleLocate(2, 2)
    ;Print(Str(i) + Space(140))
    Print(Str(i) + Space(2) + filelist_checked() + Space(2))
    i = i + 1
  Next
  If OpenFile(1,"AllResults-v2-checked.txt",#PB_File_SharedWrite | #PB_UTF8)
    ResetList(filelist_checked())
    ForEach filelist_checked()
      WriteStringN(1,filelist_checked(),#PB_UTF8)
    Next
  EndIf
EndIf
ErrorHandler() is copy from help file
When I use code 2 to test the peeks function (even 5000 times), the test is completed successfully.

Code: Select all

; code 2
Procedure test_peeks()
;For j = 1 To 5000                    ;Assume there are 5000 files
  ;Debug "j = " + Str(j)
  UseZipPacker()
  result1.i = OpenPack(0,"H:\mylib\GemBox.Document.Examples-master\C#\Advanced Features\Progress Reporting and Cancellation\Cancellation in WPF\LargeDocument.docx", #PB_PackerPlugin_Zip)
  Debug "result1 = " + result1
  If result1 > 0
    If ExaminePack(0)
      While NextPackEntry(0)
        If PackEntryName(0) = "word/document.xml"
          Debug "PackEntrySize(0) = " + Str(PackEntrySize(0))
          *p = AllocateMemory(PackEntrySize(0))
          Result2.i = UncompressPackMemory(0, *p, PackEntrySize(0),"word/document.xml")
          ;result3.i = UncompressPackFile(0,"H:\20240804\document-1.xml")
          filetext$ = PeekS(*p, Result2, #PB_UTF8)
          result4.i = CatchXML(0,*p,Result2)
          FreeXML(0)
          FreeMemory(*P)
          Debug "Result2 = " + result2
          Debug "filetext$ len = " + Str(Len(filetext$))
          ;Debug "Result3 = " + result3
          Debug "Result4 = " + result4
          ;Debug filetext$
        EndIf
      Wend
    EndIf
  EndIf
  ClosePack(0)
  ;Next
EndProcedure

test_peeks()
;;-------------2024/08/29--------------------
The problem can be temporarily solved by modifying the code and temporarily ignoring the large file. Comment the code line

Code: Select all

filetext$ = PeekS(*UnpackID, Result, #PB_UTF8)
If filetext$ <> ""
   firstlinestr = parsedocxxml(@filetext$)
EndIf

Code: Select all

firstlinestr = parsedocxxml("",*UnpackID, Result)
modiry parsedocxxml Procedure

Code: Select all

parsedocxxml("",*UnpackID, Result)
....
If XMLStatus(0) <> #PB_XML_Success
        FreeXML(0)
        ProcedureReturn firstlinestr
EndIf
;;;;------------------
Maybe I will try to modify the decompression scheme of the code. When PackEntrySize(11) is greater than 500000, decompress it directly to a temporary file.
Last edited by gltianya on Thu Aug 29, 2024 12:23 am, edited 3 times in total.
User avatar
jacdelad
Addict
Addict
Posts: 1991
Joined: Wed Feb 03, 2021 12:46 pm
Location: Riesa

Re: pb6.12b3 Not sure if it is a peeks bug?

Post by jacdelad »

I don't have your files, so I can just guess:
Does ist work with

Code: Select all

filetext$ = PeekS(*UnpackID, -1, #PB_UTF8)
?
Good morning, that's a nice tnetennba!

PureBasic 6.21/Windows 11 x64/Ryzen 7900X/32GB RAM/3TB SSD
Synology DS1821+/DX517, 130.9TB+50.8TB+2TB SSD
gltianya
User
User
Posts: 23
Joined: Wed Jul 07, 2021 12:02 pm

Re: pb6.12b3 Not sure if it is a peeks bug?

Post by gltianya »

jacdelad wrote: Mon Aug 26, 2024 3:41 pm I don't have your files, so I can just guess:
Does ist work with

Code: Select all

filetext$ = PeekS(*UnpackID, -1, #PB_UTF8)
?
Thanks, I'll try your code right away.

Just tested, the problem still exists.
Last edited by gltianya on Thu Aug 29, 2024 12:24 am, edited 1 time in total.
User avatar
jacdelad
Addict
Addict
Posts: 1991
Joined: Wed Feb 03, 2021 12:46 pm
Location: Riesa

Re: pb6.12b3 Not sure if it is a peeks bug?

Post by jacdelad »

Ok, two things:
First, we can't really reproduce this without the original files.
Second, if it's not confirmed as a bug, then post it somewhere else, not the bug section. It can be moved there later by an admin, after confirmation.
Good morning, that's a nice tnetennba!

PureBasic 6.21/Windows 11 x64/Ryzen 7900X/32GB RAM/3TB SSD
Synology DS1821+/DX517, 130.9TB+50.8TB+2TB SSD
gltianya
User
User
Posts: 23
Joined: Wed Jul 07, 2021 12:02 pm

Re: pb6.12b3 Not sure if it is a peeks bug?

Post by gltianya »

jacdelad wrote: Mon Aug 26, 2024 4:01 pm Ok, two things:
First, we can't really reproduce this without the original files.
Second, if it's not confirmed as a bug, then post it somewhere else, not the bug section. It can be moved there later by an admin, after confirmation.
I don't know how to upload the original DOCX file to the forum. If it is not appropriate to post it in this section, please move it to the corresponding section. Thank you.

I have added a download link to the original DOCX file in the first post, but I am not sure if it was shared successfully.
Last edited by gltianya on Mon Aug 26, 2024 4:34 pm, edited 1 time in total.
Fred
Administrator
Administrator
Posts: 18153
Joined: Fri May 17, 2002 4:39 pm
Location: France
Contact:

Re: pb6.12b3 Not sure if it is a peeks bug?

Post by Fred »

Try with the #PB_ByteLength flag
gltianya
User
User
Posts: 23
Joined: Wed Jul 07, 2021 12:02 pm

Re: pb6.12b3 Not sure if it is a peeks bug?

Post by gltianya »

Fred wrote: Mon Aug 26, 2024 4:21 pm Try with the #PB_ByteLength flag
I try modified
filetext$ = PeekS(*UnpackID, Result, #PB_UTF8 | #PB_ByteLength)
, The problem was not resolved.
User avatar
jacdelad
Addict
Addict
Posts: 1991
Joined: Wed Feb 03, 2021 12:46 pm
Location: Riesa

Re: pb6.12b3 Not sure if it is a peeks bug?

Post by jacdelad »

Sorry, I'm at work and can't download your file right now (restricted by my employer).
Good morning, that's a nice tnetennba!

PureBasic 6.21/Windows 11 x64/Ryzen 7900X/32GB RAM/3TB SSD
Synology DS1821+/DX517, 130.9TB+50.8TB+2TB SSD
infratec
Always Here
Always Here
Posts: 7576
Joined: Sun Sep 07, 2008 12:45 pm
Location: Germany

Re: pb6.12b3 Not sure if it is a peeks bug?

Post by infratec »

With this line:

Code: Select all

filetext$ = PeekS(*UnpackID, Result, #PB_UTF8|#PB_ByteLength)
It works with PB 6.11 x86 on Wind10 x64.

Hint:
The filename LargeDocument.docx needs to be inside AllResults-v2.txt
gltianya
User
User
Posts: 23
Joined: Wed Jul 07, 2021 12:02 pm

Re: pb6.12b3 Not sure if it is a peeks bug?

Post by gltianya »

infratec wrote: Tue Aug 27, 2024 7:18 am With this line:

Code: Select all

filetext$ = PeekS(*UnpackID, Result, #PB_UTF8|#PB_ByteLength)
It works with PB 6.11 x86 on Wind10 x64.

Hint:
The filename LargeDocument.docx needs to be inside AllResults-v2.txt
AllResults-v2.txt is a list of the absolute paths of all DOCX files on my computer, encoded in utf-8. Because code 1 had an error when processing this file, I copied this file and wrote code 2. I even tested it 5,000 times with code 2, but still couldn't find the cause.
Sorry, I can't upload all the DOCX files in my computer.
Thanks for your help. I have uploaded the tool that generates AllResults-v2.txt. If you have time, download and unzip it, and run secdocsdetector.exe in CMD state. It will generate AllResults-v2.txt in the same directory. The content of this file is a list of the absolute paths of all files in docx/doc/wps/pdf/ofd format on the local computer. Encoded as utf-8
Thanks to all the guidance and help from my friends, I feel that the existence of secdocsdetector.exe is meaningless and there is no need to waste time. Therefore, I deleted it from google drvie. Sorry for disturbing you!
Last edited by gltianya on Wed Aug 28, 2024 2:24 pm, edited 1 time in total.
gltianya
User
User
Posts: 23
Joined: Wed Jul 07, 2021 12:02 pm

Re: pb6.12b3 Not sure if it is a peeks bug?

Post by gltianya »

infratec wrote: Tue Aug 27, 2024 7:18 am With this line:

Code: Select all

filetext$ = PeekS(*UnpackID, Result, #PB_UTF8|#PB_ByteLength)
It works with PB 6.11 x86 on Wind10 x64.

Hint:
The filename LargeDocument.docx needs to be inside AllResults-v2.txt
I tried inserting a line

Code: Select all

filetext$ = space(Result) 

before the code

Code: Select all

filetext$ = PeekS(*UnpackID, Result, #PB_UTF8|#PB_ByteLength)
and the error occurred in the newly inserted line.

Therefore, I guessed that it was caused by string management, so I did the following test. The code is as follows

Code: Select all

; windows 8.1 x64
; pb6.12b3

For j=1 To 50
  
  str1$ = ""
  ; Maximum = 1070000000 , Minimum =1060000000      (X64 OK, loop 50 times)
  ; Maximum = 1080000000 , Minimum =1070000000      (X64 error)
  ; Maximum =  360000000 , Minimum = 350000000      (x86 error)
  ; Maximum =  350000000 , Minimum = 340000000      (x86 OK, loop 50 times)
  i.i = Random(1070000000,1060000000)
  Debug "j = " + Str(j)
  Debug "i = " + Str(i)
  str1$ = Space(i)
  Debug "str1$=" + str1$
  
Next
Last edited by gltianya on Thu Aug 29, 2024 12:25 am, edited 1 time in total.
User avatar
mk-soft
Always Here
Always Here
Posts: 6202
Joined: Fri May 12, 2006 6:51 pm
Location: Germany

Re: pb6.12b3 Not sure if it is a peeks bug?

Post by mk-soft »

Limit String does not mean that String can be infinitely large. A contiguous memory must be provided by the operating system for this. This cannot always be successful if there is not enough contiguous memory available.
This is not a bug of Purebasic.
You should also not work with such large strings. Other methods with smaller strings are recommended. For example LinkedList. But you also have to check the available memory at some point.
My Projects ThreadToGUI / OOP-BaseClass / EventDesigner V3
PB v3.30 / v5.75 - OS Mac Mini OSX 10.xx - VM Window Pro / Linux Ubuntu
Downloads on my Webspace / OneDrive
User avatar
HeX0R
Addict
Addict
Posts: 1187
Joined: Mon Sep 20, 2004 7:12 am
Location: Hell

Re: pb6.12b3 Not sure if it is a peeks bug?

Post by HeX0R »

That code might be nonsense, but a PB command should never end-up in an IMA, IMHO.
gltianya
User
User
Posts: 23
Joined: Wed Jul 07, 2021 12:02 pm

Re: pb6.12b3 Not sure if it is a peeks bug?

Post by gltianya »

mk-soft wrote: Wed Aug 28, 2024 11:32 am Limit String does not mean that String can be infinitely large. A contiguous memory must be provided by the operating system for this. This cannot always be successful if there is not enough contiguous memory available.
This is not a bug of Purebasic.
You should also not work with such large strings. Other methods with smaller strings are recommended. For example LinkedList. But you also have to check the available memory at some point.
Thank you for your guidance. I encountered a problem before and couldn't find the reason for the error. The document.xml file extracted from the LargeDocument.docx file is (89298KB) 91,440,162 bytes, so I wanted to try to see how long the string variable of PB X86 can support. (On my computer, I can successfully execute the program in my first post using X64, but not X86). Anyway, thank you.
Quin
Addict
Addict
Posts: 1122
Joined: Thu Mar 31, 2022 7:03 pm
Location: Colorado, United States
Contact:

Re: pb6.12b3 Not sure if it is a peeks bug?

Post by Quin »

HeX0R wrote: Wed Aug 28, 2024 1:09 pm That code might be nonsense, but a PB command should never end-up in an IMA, IMHO.
Think IMA was unavoidable in this case, sounds like they were running out of 32-bit address space.
Post Reply