Greetings to all,
I have a large file, 150GB+, that needs to be parsed. So, that probably excludes reading whole file to memory since it would require quite a machine.
Would it be better to write chunks of file to memory and than let thread do the work while reading rest of the file and repeat procedure until the end of file? Or read a part of file and than split worker threads to work with that string List?
Could someone share a bit of code to see how to protect (and maintain continuous) read of file and write to output file in such case(s)?
Thanks in advance,
Bruno
Multithreaded read/write/parsing of files
Re: Multithreaded read/write/parsing of files
Read it into a sqlite/mysql db, then you can query it. Parsing entirely to memory will be temporary and slow as it grows.
The nice thing about standards is there are so many to choose from. ~ Andrew Tanenbaum
- NicTheQuick
- Addict
- Posts: 1227
- Joined: Sun Jun 22, 2003 7:43 pm
- Location: Germany, Saarbrücken
- Contact:
Re: Multithreaded read/write/parsing of files
As I know the file functions are already cached. So it should not make too much difference between reading directly from the file compared to reading the whole file to memory and parse it there. Especially if you do sequential reading you should not need to cache big chunks of the file.
The english grammar is freeware, you can use it freely - But it's not Open Source, i.e. you can not change it or publish it in altered way.
-
- User
- Posts: 34
- Joined: Sun Nov 23, 2014 1:18 pm
Re: Multithreaded read/write/parsing of files
Using an SQLite database as skywalk mentioned should be especially usefull when parsing thru the file more than once.
Maybe it even works if you create an in-memory db (i never had such big amounts of data so i can't say if the SQLite-engine swaps data from RAM to disk in that case).
When using a virtual table you could also use SQLite full-text-search.
...just my 2 cent.
Maybe it even works if you create an in-memory db (i never had such big amounts of data so i can't say if the SQLite-engine swaps data from RAM to disk in that case).
When using a virtual table you could also use SQLite full-text-search.
Code: Select all
EnableExplicit
UseSQLiteDatabase()
Enumeration
#db_handle
EndEnumeration
If OpenDatabase(#db_handle, ":memory:", "", "", #PB_Database_SQLite)
If DatabaseUpdate(#db_handle, "create virtual table test using fts4(column1 varchar(100), column2 varchar(100), column3 varchar(100));")
; ...
If DatabaseUpdate(#db_handle, "insert into test values('Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam', " +
"'nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam', " +
"'erat, sed diam voluptua. At vero eos et accusam et justo duo');")
If DatabaseQuery(#db_handle, "select * from test where test match 'eos';")
; ...
Else
Debug "error searching"
EndIf
Else
Debug "error inserting data"
EndIf
Else
Debug "error creating virtual table"
EndIf
Else
Debug "error opening database in memory"
EndIf
End
- NicTheQuick
- Addict
- Posts: 1227
- Joined: Sun Jun 22, 2003 7:43 pm
- Location: Germany, Saarbrücken
- Contact:
Re: Multithreaded read/write/parsing of files
Depending on how complex your parsing process is there should be no benefit in splitting the file in certain parts and using more than one worker. Reading from and writing to the file will be the slowest part in your algorithm. So in my opinion it is perfectly sufficient to use one parser thread which reads the file one by one line.
Independently from this can you please explain your problem in more detail?
Independently from this can you please explain your problem in more detail?
The english grammar is freeware, you can use it freely - But it's not Open Source, i.e. you can not change it or publish it in altered way.