How to speed up multiple inserts in SQLite

Mistrel · Post by **Mistrel** » Sat Nov 03, 2007 8:48 pm

This is basically a copy from tanguay.info since there was hardly anything to convert to PB.
http://www.tanguay.info/web/codeExample.php?id=728

Credit to pdwyer for bringing this to my attention in one of this posts.

Code: Select all

sql.s="Insert into MyTable(Column) values('Value')"
sqlite_exec(hDB,"Begin")
for i=1 to 500
	sqlite_exec(hDB,sql.s);
next i
sqlite_exec(hDB,"End");

Fangbeast · Post by **Fangbeast** » Sun Nov 04, 2007 12:43 am

This sounds interesting, a way of caching the transactions for speed reasons. Sounds like a good way to have an sqlite database attached to a machine that is the database and connection server where you have a lot of people hitting the machine, given that sqlite is not client/server.

Brings possibilities..

Mistrel · Post by **Mistrel** » Sun Nov 04, 2007 1:48 am

I'm still trying to figure out how END works but if you use COMMIT instead (END may be the same) the changes made to the database will be Atomic.

You can perform the same request when selecting data from a database too.

Fangbeast · Post by **Fangbeast** » Sun Nov 04, 2007 2:36 am

The way I read it, "End" tells sqlite that you have finished sending queries (which, as I understand it, are cached to disk) and then a "Commit" is performed to add the cached queries to database permanently.

I think "Commit" normally does the transactions immediately without caching the statements cased by the "begin" and "end" markers but I could be wrong.

P.s. I've never understood what "Atomic" means in relation to a database. Only for physics:):)

Mistrel · Post by **Mistrel** » Sun Nov 04, 2007 3:38 am

pdwyer explains the END/COMMIT confusion!
http://www.purebasic.fr/english/viewtopic.php?t=29414

Atomic means that either all of the changes occur or none at all. For example, if you're half-way through updating a database and your power goes out you never send a COMMIT so the changes are automatically rolled back to the previous state.

I believe that's how Windows Vista (TM) handles files.

Wikipedia wrote:Transactional NTFS

Transactional NTFS (abbreviated TxF) brings the concept of atomic transactions to the NTFS file system, allowing Windows application developers to write file output routines that are guaranteed to either completely succeed or completely fail. Transactional NTFS allows for files and directories to be created, renamed, and deleted atomically. Using transaction ensures correctness of operation; in a series of file operations (done as a transaction), the operation will be committed if all the operations succeed. In case of any failure, the entire operation will rollback and fail.

Transactional NTFS is implemented on top of the Kernel Transaction Manager (KTM), which is a Windows kernel component first introduced in Windows Vista that provides transactioning of objects in the kernel. The NTFS file system already supports journaling of low-level operations, such as writing a block of data. Transactional NTFS expands on this capability to include:

* Atomic operations on a single file:

A common example of this is saving a file from an application; if the application or machine were to crash while writing the file, then only part of the file could be written, possibly resulting in a corrupted file. This would be a very significant problem if a previous version of the file was being over-written, as data would likely be lost.

* Atomic operations spanning multiple files:

If an application needs to update several files at once with a set of changes, all the necessary file operations can be performed as a single transaction, preventing inconsistent updates in the event of a failure.

* Atomic operations spanning multiple computers:

Performing the same operation on multiple computers is a fairly common administrative task in a corporate network; Transactional NTFS integrates with the Distributed Transaction Coordinator to ensure that the change is successfully applied to all machines.

Fangbeast · Post by **Fangbeast** » Sun Nov 04, 2007 6:19 am

Thanks for that, there are huge gaps in my knowledge regarding many database features that I don't use.

Dare · Post by **Dare** » Sun Nov 04, 2007 6:22 am

Thanks for the explanations! Sometimes a few words make a huge difference to understanding.

Heathen · Post by **Heathen** » Sun Nov 04, 2007 9:14 am

Also it's a good idea to group them into 1 call, ie:

Code: Select all

sql.s
for i=1 to 500
   sql + "Insert into MyTable(Column) values('Value');"
next i
sqlite_exec(hDB,"Begin;" + sql + "End;");

I suggest also checking out the PRAGMA settings. there's many optimizations that can be made.

graves · Post by **graves** » Sun Nov 04, 2007 10:05 am

I do'nt know SQLite, but in MySQL it's very very faster to group in a sentence, like this:

Code: Select all

INSERT INTO nametable [(field, field...] VALUES (data1, data1....), (data2, data2...), (data3, data3...)...(and so until 4MB of text)

In PureBasic: (asume 3 data field values stored in a LinkedList)

Code: Select all

sqlcmd.s = "INSERT INTO nametable VALUES "
Foreach LnkLstData()
  sqlcmd + "("+ LnkLstData()\field1+","+ LnkLstData()\field2 +","+ LnkLstData()\field3 +"),"
next
sqlcmd = left(sqlcmd, len(sqlcmd)-1)  ;deletes last comma character
mysql_real_queryl(*mysql,sqlcmd,len(sqlcmd))

pdwyer · Post by **pdwyer** » Sun Nov 04, 2007 10:17 am

From the SQLite page

When all the INSERTs are put in a transaction, SQLite no longer has to close and reopen the database or invalidate its cache between each statement... When unshackled in this way, SQLite is much faster than either PostgreSQL and MySQL.

More details about perf considerations are here http://www.sqlite.org/cvstrac/wiki?p=Pe ... iderations

If you want more speed and only need the data temporarily (perform complex queries on data and want to use SQL) you can use "In Memory" DBs too that don't write to disk at all. When you close the DB the memory is freed.

http://www.purebasic.fr/english/viewtopic.php?t=27149

mskuma · Post by **mskuma** » Sun Nov 04, 2007 11:21 am

Yes I agree that using 'begin transaction' & a bunch of inserts & 'commit transaction' make a huge difference. To speed up db operations, lately I've been playing with ramdrives & this is better than using disk-based dbs, but I've never tried 'in-memory'. It is easy enough to use SQL statements to copy an existing disk-based database into memory for use, then write the updates back to the disk-based one to preserve any changes?

pdwyer · Post by **pdwyer** » Sun Nov 04, 2007 3:38 pm

I think that in memory dbs lend themselves to a different kind of application. Rather than being a database in the traditional sense it gives you complex memory structures and querying features beyond what just a programming language will do for you.

You might use a linked list rather than array for its features even though it's not as fast at raw access. This is (IMO) just another step further in that direction. Rather than load a list, load an "in memory" DB then you can execute complex joins and grouping, sorting and filtering with just a couple of lines of code and select out the result. All that doesn't need to go to disk at all.

One thing I used this for was to load two tab delimited text files into two tables and perform joins on them for comparisons then output the result into the same format of tabbed text. It would have taken quite a bit of code to do that with arrays and would have been slower by far to use a "real" (disk) database due to the loading of very large text (log) files.

Just some thoughts...

Dare · Post by **Dare** » Sun Nov 04, 2007 9:29 pm

pdwyer wrote:Just some thoughts...

Good ones, and food for more.

* Lateral thinking procedures envoked ... *

pdwyer · Post by **pdwyer** » Mon Nov 05, 2007 10:29 am

Code: Select all

InitRightBrain()

Dare · Post by **Dare** » Mon Nov 05, 2007 11:22 pm

.. invoked