It is currently Thu Dec 13, 2018 6:50 pm

All times are UTC + 1 hour




Post new topic Reply to topic  [ 6 posts ] 
Author Message
 Post subject: Multithreaded read/write/parsing of files
PostPosted: Sun Nov 11, 2018 8:49 pm 
Offline
Enthusiast
Enthusiast
User avatar

Joined: Tue May 28, 2013 10:51 pm
Posts: 531
Location: Europe
Greetings to all,

I have a large file, 150GB+, that needs to be parsed. So, that probably excludes reading whole file to memory since it would require quite a machine.

Would it be better to write chunks of file to memory and than let thread do the work while reading rest of the file and repeat procedure until the end of file? Or read a part of file and than split worker threads to work with that string List?

Could someone share a bit of code to see how to protect (and maintain continuous) read of file and write to output file in such case(s)?

Thanks in advance,

Bruno

_________________
"If you lie to the compiler, it will get its revenge."
Henry Spencer
https://www.pci-z.com/


Top
 Profile  
Reply with quote  
 Post subject: Re: Multithreaded read/write/parsing of files
PostPosted: Mon Nov 12, 2018 3:17 am 
Online
Addict
Addict
User avatar

Joined: Wed Dec 23, 2009 10:14 pm
Posts: 2757
Location: Boston, MA
Read it into a sqlite/mysql db, then you can query it. Parsing entirely to memory will be temporary and slow as it grows.

_________________
The nice thing about standards is there are so many to choose from. ~ Andrew Tanenbaum


Top
 Profile  
Reply with quote  
 Post subject: Re: Multithreaded read/write/parsing of files
PostPosted: Mon Nov 12, 2018 10:21 am 
Offline
Enthusiast
Enthusiast
User avatar

Joined: Sun Jun 22, 2003 7:43 pm
Posts: 292
Location: Germany, Homburg (Saar)
As I know the file functions are already cached. So it should not make too much difference between reading directly from the file compared to reading the whole file to memory and parse it there. Especially if you do sequential reading you should not need to cache big chunks of the file.

_________________
Electronics, Crazy & Interesting Stuff, all that with text, image and sound? Click here!

The english grammar is freeware, you can use it freely - But it's not Open Source, i.e. you can not change it or publish it in altered way.


Top
 Profile  
Reply with quote  
 Post subject: Re: Multithreaded read/write/parsing of files
PostPosted: Mon Nov 12, 2018 1:15 pm 
Offline
User
User

Joined: Sun Nov 23, 2014 1:18 pm
Posts: 28
Using an SQLite database as skywalk mentioned should be especially usefull when parsing thru the file more than once.

Maybe it even works if you create an in-memory db (i never had such big amounts of data so i can't say if the SQLite-engine swaps data from RAM to disk in that case).

When using a virtual table you could also use SQLite full-text-search.

Code:
EnableExplicit

UseSQLiteDatabase()

Enumeration
  #db_handle
EndEnumeration

If OpenDatabase(#db_handle, ":memory:", "", "", #PB_Database_SQLite)
  If DatabaseUpdate(#db_handle, "create virtual table test using fts4(column1 varchar(100), column2 varchar(100), column3 varchar(100));")
    ; ...
    If DatabaseUpdate(#db_handle, "insert into test values('Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam', " +
                                  "'nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam', " +
                                  "'erat, sed diam voluptua. At vero eos et accusam et justo duo');")   
   
      If DatabaseQuery(#db_handle, "select * from test where test match 'eos';")
        ; ...
      Else
        Debug "error searching"
      EndIf
    Else
      Debug "error inserting data"
    EndIf   
  Else
    Debug "error creating virtual table"
  EndIf
Else
  Debug "error opening database in memory"
EndIf

End

...just my 2 cent.


Top
 Profile  
Reply with quote  
 Post subject: Re: Multithreaded read/write/parsing of files
PostPosted: Mon Nov 12, 2018 1:29 pm 
Offline
Enthusiast
Enthusiast
User avatar

Joined: Sun Jun 22, 2003 7:43 pm
Posts: 292
Location: Germany, Homburg (Saar)
Depending on how complex your parsing process is there should be no benefit in splitting the file in certain parts and using more than one worker. Reading from and writing to the file will be the slowest part in your algorithm. So in my opinion it is perfectly sufficient to use one parser thread which reads the file one by one line.

Independently from this can you please explain your problem in more detail?

_________________
Electronics, Crazy & Interesting Stuff, all that with text, image and sound? Click here!

The english grammar is freeware, you can use it freely - But it's not Open Source, i.e. you can not change it or publish it in altered way.


Top
 Profile  
Reply with quote  
 Post subject: Re: Multithreaded read/write/parsing of files
PostPosted: Tue Nov 13, 2018 9:51 am 
Offline
Enthusiast
Enthusiast

Joined: Sat Feb 08, 2014 3:26 pm
Posts: 528
Need more details:

  • Is it a Text file (so with EOL, EOF) or a Binary file ?
  • Is the file structured and if so, in what way (separators, tags or fixed length fields)
  • The writings must be made in the same file (Edit) or in another file (Create)?

For a single linear processing (parsing), it is not faster and even useless to load the entire file into memory since it must be read at least once on the disk anyway.
That's why we've been doing it for years, even with machines that are low in RAM and using specialized parsers like the very small AWK

8)


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 6 posts ] 

All times are UTC + 1 hour


Who is online

Users browsing this forum: skywalk and 13 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
Jump to:  

 


Powered by phpBB © 2008 phpBB Group
subSilver+ theme by Canver Software, sponsor Sanal Modifiye