Page 1 of 1
Probably bug with REGEXP
Posted: Wed Feb 05, 2014 5:38 pm
by loulou25223
toto23 is a string of 230 M°
Code: Select all
If CreateRegularExpression(0, "(<PmtInf>).*(</PmtInf>\s{0,10})</CstmrDrctDbtInitn>", #PB_RegularExpression_DotAll | #PB_RegularExpression_MultiLine)
Dim resultat$(0)
Nb = ExtractRegularExpression(0, toto23, Resultat$())
When i execute the code
Nb
return 1 meaning that's regexp have find what i want to extract. The extract string is probably 2228 M° lenght
if i want to see resultat$(0)
the array is empty
The same instruct with the same string works well in purebasic X64 version witout crash
Re: Probably bug with REGEXP
Posted: Sun Feb 09, 2014 9:24 pm
by freak
Can you post a piece of code that can be executed?
Re: Probably bug with REGEXP
Posted: Mon Feb 10, 2014 8:38 am
by loulou25223
We load the file in memory
The file will be
230 M0 length because with
220 MO it works well
The result of Create Regular expression
Nb is loaded with
1 indicating the result is
good
The array Result$ is empty a,d the variable
Variable return nothing only a
0 when you want to see the length
With the version of Purebasic X64 theres's no problem , this problem appear only with X86 version
Code: Select all
If ReadFile(0, sxml)
FileBuffersSize(0, FileSize(sxml) +2 )
length = Lof(0)
*MemoryID = AllocateMemory(length)
If *MemoryID
bytes = ReadData(0, *MemoryID, length)
; Debug "Info" + Str(bytes)+" was read"
toto23.s = PeekS(*MemoryId, length )
FreeMemory(*MemoryId);
CloseFile(0)
EndIf;
If CreateRegularExpression(0, "(<PmtInf>).*(</PmtInf>\s{0,10})</CstmrDrctDbtInitn>", #PB_RegularExpression_DotAll | #PB_RegularExpression_MultiLine)
Dim resultat$(0)
Nb = ExtractRegularExpression(0, toto23, Resultat$())
variable.s = Resultat$(0)
else
debug "Erreur"
Endif
I hope it's suffisant for you to see the problem
Re: Probably bug with REGEXP
Posted: Tue Feb 18, 2014 7:45 am
by loulou25223
No one is answering ?
Re: Probably bug with REGEXP
Posted: Tue Feb 18, 2014 10:27 am
by Fred
IMHO, you are hitting PB internal limitation in 32-bit. 230 MB for one string is huge, especially as it needs to be buffered into an internal buffer which will also takes the same size. Also evey results will be allocated in an array, how many result do you have for such a REGEXP ? You should try to split your string and reduce footprint.
Re: Probably bug with REGEXP
Posted: Tue Feb 18, 2014 2:34 pm
by Foz
Note that there is a theoretical 2gb limit for any 32 bit application.
In reality, once you hit about 1.5gb the application will keel over unless you have a monstrous amount of ram available so memory fragmentation is not an issue (when an application tries to allocate a set size of memory, it is requesting for a single block of ram.
If windows cannot supply it, the application will keel over.
Now for this application, if you are running in non-unicode mode, and the file is ascii, I have tested with a 240mb text file, and there is the 240mb read in, plus the memory that is used by the regex, and you can check your "Peak Working Set" in task manager to verify this, but it weighs in at 993,377kb of memory used at one time.
Now lets set this to unicode compilation:
same ascii file, but with the parameter ", #PB_Ascii" to the PeekString()
Result? Peak Memory Usage: 1,487,392kb
And now for the final test:
the same ascii file, but encoded in unicode, running in a unicode compilation:
kaboom! fails to even create a string. This is a 481mb file, in unicode.
I think your problem is memory limitations of 32 bit applications.