Probably bug with REGEXP

Just starting out? Need help? Post your questions and find answers here.
loulou25223
User
User
Posts: 32
Joined: Thu Jan 16, 2014 7:07 pm

Probably bug with REGEXP

Post by loulou25223 »

toto23 is a string of 230 M°

Code: Select all

If  CreateRegularExpression(0, "(<PmtInf>).*(</PmtInf>\s{0,10})</CstmrDrctDbtInitn>", #PB_RegularExpression_DotAll | #PB_RegularExpression_MultiLine)
     Dim resultat$(0)
     Nb = ExtractRegularExpression(0, toto23, Resultat$())
When i execute the code
Nb
return 1 meaning that's regexp have find what i want to extract. The extract string is probably 2228 M° lenght
if i want to see resultat$(0)
the array is empty
The same instruct with the same string works well in purebasic X64 version witout crash
freak
PureBasic Team
PureBasic Team
Posts: 5940
Joined: Fri Apr 25, 2003 5:21 pm
Location: Germany

Re: Probably bug with REGEXP

Post by freak »

Can you post a piece of code that can be executed?
quidquid Latine dictum sit altum videtur
loulou25223
User
User
Posts: 32
Joined: Thu Jan 16, 2014 7:07 pm

Re: Probably bug with REGEXP

Post by loulou25223 »

We load the file in memory
The file will be 230 M0 length because with 220 MO it works well
The result of Create Regular expression Nb is loaded with 1 indicating the result is good
The array Result$ is empty a,d the variable Variable return nothing only a 0 when you want to see the length
With the version of Purebasic X64 theres's no problem , this problem appear only with X86 version

Code: Select all

If ReadFile(0, sxml)
  FileBuffersSize(0, FileSize(sxml) +2 )
  length = Lof(0)     
  *MemoryID = AllocateMemory(length)  
  If *MemoryID
    bytes = ReadData(0, *MemoryID, length)
    ; Debug "Info" + Str(bytes)+" was read"
    toto23.s = PeekS(*MemoryId, length   )
    FreeMemory(*MemoryId);
    CloseFile(0)
  EndIf;
  If  CreateRegularExpression(0, "(<PmtInf>).*(</PmtInf>\s{0,10})</CstmrDrctDbtInitn>", #PB_RegularExpression_DotAll | #PB_RegularExpression_MultiLine)
     Dim resultat$(0)
     Nb = ExtractRegularExpression(0, toto23, Resultat$())
variable.s = Resultat$(0)
else 
debug "Erreur"
Endif 
I hope it's suffisant for you to see the problem
loulou25223
User
User
Posts: 32
Joined: Thu Jan 16, 2014 7:07 pm

Re: Probably bug with REGEXP

Post by loulou25223 »

No one is answering ?
Fred
Administrator
Administrator
Posts: 18162
Joined: Fri May 17, 2002 4:39 pm
Location: France
Contact:

Re: Probably bug with REGEXP

Post by Fred »

IMHO, you are hitting PB internal limitation in 32-bit. 230 MB for one string is huge, especially as it needs to be buffered into an internal buffer which will also takes the same size. Also evey results will be allocated in an array, how many result do you have for such a REGEXP ? You should try to split your string and reduce footprint.
Foz
Addict
Addict
Posts: 1359
Joined: Tue Nov 13, 2007 12:42 pm
Location: Manchester, UK

Re: Probably bug with REGEXP

Post by Foz »

Note that there is a theoretical 2gb limit for any 32 bit application.
In reality, once you hit about 1.5gb the application will keel over unless you have a monstrous amount of ram available so memory fragmentation is not an issue (when an application tries to allocate a set size of memory, it is requesting for a single block of ram.
If windows cannot supply it, the application will keel over.

Now for this application, if you are running in non-unicode mode, and the file is ascii, I have tested with a 240mb text file, and there is the 240mb read in, plus the memory that is used by the regex, and you can check your "Peak Working Set" in task manager to verify this, but it weighs in at 993,377kb of memory used at one time.

Now lets set this to unicode compilation:
same ascii file, but with the parameter ", #PB_Ascii" to the PeekString()
Result? Peak Memory Usage: 1,487,392kb

And now for the final test:
the same ascii file, but encoded in unicode, running in a unicode compilation:
kaboom! fails to even create a string. This is a 481mb file, in unicode.

I think your problem is memory limitations of 32 bit applications.
Post Reply