Empty string when handling PeekS with large memory-data

Just starting out? Need help? Post your questions and find answers here.
KosterNET
User
User
Posts: 30
Joined: Tue Mar 22, 2016 10:08 pm

Empty string when handling PeekS with large memory-data

Post by KosterNET »

I want to scan documents for certain words. I do this by making a list of words (Newlist) and then reading file-contents into a string and using FindString to search for the different words.

Thanks to this forum I was able to create the following code:

Code: Select all

File.s="C:\Temp\FileToSearch.txt"
NewList Words.s()
AddElement(Words())
Words()="TextToFind"

RF=ReadFile(#PB_Any,File,#PB_File_SharedRead)
While Not Eof(RF)
  FileText$=ReadString(RF,#PB_File_IgnoreEOL)
Wend
CloseFile(RF)

ForEach Words()
  If FindString(FileText$,Words())>0
    Debug "Word found : " + Words()
  EndIf
Next

However, I found out that this works really slow for large files. I have tried the following:

Code: Select all

File.s="C:\Temp\FileToSearch.txt"
NewList Words.s()
AddElement(Words())
Words()="TextToFind"

RF=ReadFile(#PB_Any,File,#PB_File_SharedRead)
Lengte=Lof(RF)
*MemoryID=AllocateMemory(Lengte)
If *MemoryID
  If ReadData(RF,*MemoryID,Lengte)=Lengte
    FileText$=PeekS(*MemoryID,Lengte,#PB_UTF8)
  EndIf
  CloseFile(RF)
  
  ForEach Words()
    If FindString(FileText$,Words())>0
      Debug "Word found : " + Words()
    EndIf
  Next
EndIf
I would like to know if I would need that '|#PB_ByteLength' - in the PeekS-command or not.

Also, this seems to work with large files (500Mb+), but when I use it on a file exceeding 1.3Gb, the PeekS returns an empty string. Can anyone help me out what to do? Or is it not possible to read the file in one string despithe the comment that a String is unlimited?

-Code edited as a result of feedback from mk-soft-

Many thanks in advance,
Geert
Last edited by KosterNET on Tue May 03, 2022 11:51 am, edited 2 times in total.
User avatar
mk-soft
Always Here
Always Here
Posts: 5409
Joined: Fri May 12, 2006 6:51 pm
Location: Germany

Re: Empty string when handling PeekS with large memory-data

Post by mk-soft »

The controls are missing

- test *memorid whether memory is contiguously available
- test number of bytes read from ReadData
- Peeks fails because too much memory is needed.
--> Total: One block 1.3gb memory + One block 2*1.3gb for string (PB strings are Unicode (2 bytes per character).
My Projects ThreadToGUI / OOP-BaseClass / EventDesigner V3
PB v3.30 / v5.75 - OS Mac Mini OSX 10.xx - VM Window Pro / Linux Ubuntu
Downloads on my Webspace / OneDrive
KosterNET
User
User
Posts: 30
Joined: Tue Mar 22, 2016 10:08 pm

Re: Empty string when handling PeekS with large memory-data

Post by KosterNET »

Hi MK-Soft,

First of all, thank you very much for your quick and complete response. At first I wrote the 'simplified' version, but I expect this version to be more complete (I also missed a CloseFile ;-)).

You are completely correct that code should be complete.

Your suspicion that this could be a problem with amount of memory available can be correct. I will check if I can run it on a system with more physical memory available. I have virtual memory enabled, but maybe this is not sufficient.
User avatar
Lord
Addict
Addict
Posts: 849
Joined: Tue May 26, 2009 2:11 pm

Re: Empty string when handling PeekS with large memory-data

Post by Lord »

Hi!
KosterNET wrote: Tue May 03, 2022 11:57 am...
Your suspicion that this could be a problem with amount of memory available can be correct. I will check if I can run it on a system with more physical memory available. I have virtual memory enabled, but maybe this is not sufficient.
I do not know if this is still an issue, but there was a memory issue PB/Win7.
https://www.purebasic.fr/english/viewto ... 5&start=13
https://www.purebasic.fr/english/viewtopic.php?p=423221
https://www.purebasic.fr/english/viewtopic.php?p=432118
There was an limit of 3.2GB by Windows and 2.0GB by PB.
Image
User avatar
Tenaja
Addict
Addict
Posts: 1949
Joined: Tue Nov 09, 2010 10:15 pm

Re: Empty string when handling PeekS with large memory-data

Post by Tenaja »

String compares are relatively slow--especially so, with large quantities of words. Lists are slow when iterating through looking for that string compare to match.

Have you considered using a map? Have the map store a structure of two variables, e.g. IsSearched (flagging the mapkey as one of the "certain words"), and QtyFound. Then go through the map (ForEach) at the end and if it has both IsSearched And QtyFound, you've got your word (mapkey) to act upon. Anything without those two are words you don't care about.
KosterNET
User
User
Posts: 30
Joined: Tue Mar 22, 2016 10:08 pm

Re: Empty string when handling PeekS with large memory-data

Post by KosterNET »

Hi Tenaja,

I must say that I am quite impressed with the possibility to use FindString for a large string. When I have a 500Mb+ string, and I search for 50 words it only takes a fraction of a second. That is good for what I use. I do not need any counts so just a Yes or No is ok for me.

Thank you for your tips however. Might be interesting when I want more details.

Regards,
Geert
Post Reply