Page 1 of 1

Millisecond time resolution with databases?

Posted: Mon Jan 13, 2025 1:02 pm
by Skipper
Hi Group,

after many years, I decided to check on PureBasic once more. I liked what I saw and decided to return to using this compiler.

For now, I have a question about the databases to which PureBasic provides access:

I need to work with rather big time-series. These series are sorted by date/time, and as these series potentially contain multiple events within any single second, these time-series use time-date fields with a resolution up to a millisecond. The current logs are 50+GB, and growing.

Is it feasible to store and use this time-resolution with, say, SQLite? I do know that I have to write my own PB-code to handle milliseconds, but what about the database side of things?

Cheers
Skipper

Re: Millisecond time resolution with databases?

Posted: Mon Jan 13, 2025 1:36 pm
by NicTheQuick
For time series data SQLite or any other relational database is never a good idea.
I'm afraid you need a proper time series database like InfluxDB, TimescaleDB, OpenTSDB, Graphite or even Prometheus. However, there is no ready to use connection in Purebasic yet. You will have to write the communication yourself. But this should be quite easy with the HTTP library in most of the cases.

Re: Millisecond time resolution with databases?

Posted: Mon Jan 13, 2025 1:48 pm
by Skipper
Hi Nic,

thank you for pointing me to databases that are specifically meant for time-series data.

Cheers
Skipper

EDIT: I should have added that my app will not require real-time analysis, so I wonder if in that scenario it would still be best to not use an SQL database?

Re: Millisecond time resolution with databases?

Posted: Mon Jan 13, 2025 4:32 pm
by mk-soft
SQLite support timestamp with milliseconds as TEXT as ISO8601 strings ("YYYY-MM-DD HH:MM:SS.SSS") and REAL (8 bytes) as Julian day numbers

Link:
https://www.sqlite.org/datatype3.html
https://www.sqlite.org/lang_datefunc.html

Re: Millisecond time resolution with databases?

Posted: Mon Jan 13, 2025 5:30 pm
by NicTheQuick
Skipper wrote: Mon Jan 13, 2025 1:48 pm EDIT: I should have added that my app will not require real-time analysis, so I wonder if in that scenario it would still be best to not use an SQL database?
I don't know if SQLite is the best idea with 50+GB of data. In our company we use InfluxDB to store and read realtime metrics of 40.000+ services and although we acquire these data over several year the size of the database is still not in the 50GB range. And we need a lot of memory (48GB+ RAM) to handle that data properly.
So... Of course SQlite can store that amount of data but I don't think you will be happy with it.

Re: Millisecond time resolution with databases?

Posted: Mon Jan 13, 2025 10:20 pm
by idle
Out of the the 50 gb how many entries does the db store?
I only ask as I'm writing a db engine for PB but it currently stores the primary index in memory which limits it somewhat to the available ram but it can easily deal with 200m entries on a desktop though that mightn't be enough.
However in the case of time series it would be easy enough to range index and then create dynamic multiple views by scanning through the ranges and indexing them. So an index of ranges which then indexes the views.
The core does around 1m reads per thread, 1m Enums per thread and 200k writes p/s concurrently, though that would probably translate to 20k writes to disk p/s with 16 byte keys.
In terms of performance it's faster than Facebooks RocksDB, which is an LSM tree, it's memtable does
4.5M - 7M read QPS for point lookups with sustained writes
4M - 6M QPS prefix range scans with sustained writes.

The writes are limited to 20k p/s and that's running on 32 threads with 131m entries and 144gb ram
my core with 8 readers 2 enums and 2 writers is doing around 8m reads, 2m prefix, and 200k writes on 12 threads concurrently on 16gb ram. though that's with 16 byte keys vs 19 byte keys. its O(k) vs O(log n) so it scales well but there's still a long way to go before it's production ready.

Re: Millisecond time resolution with databases?

Posted: Tue Jan 14, 2025 11:54 am
by Skipper
@mk-soft, @nic:
Thanks a lot for your input. Unfortunately, the date/time-stamp is not quite conforming to ISO, so some conversion needs to be done on import. That's OK though, see below:

@idle:
Thanks for the insights you provide about your project. It might become interesting down the line, I'll follow your progress.
The application I intend to create is a 'back-testing engine' for currency and future trading. The import files have very tiny records, four fields only, per underlying some 2-6 mio/month of them. Following 16 currency pairs and a handful of index futures, and you can imagine where that leads up to.
For this reason, I thought of range indexing as well, so on import I intend to store derived information as well (like week number, etc.). Also, the date/time-field I may well split into its constituents, which would offer another way to ease range indexing. Besides, I'm likely to create a single table per underlying, which also aids in segmentation.

Cheers
Skipper

Re: Millisecond time resolution with databases?

Posted: Tue Jan 14, 2025 9:15 pm
by idle
Skipper wrote: Tue Jan 14, 2025 11:54 am I intend to create is a 'back-testing engine' for currency and future trading. The import files have very tiny records, four fields only, per underlying some 2-6 mio/month
you probably won't need a dB for that if it's all time stamped in increasing order just use a sliding windows and seek through the file loading into an array at what ever sample rate you want and that way you can do fft's to see any periodic cycles and correlate the lag or feed into a Neural network or do PCA principal component analysis

Re: Millisecond time resolution with databases?

Posted: Wed Jan 15, 2025 9:46 am
by Skipper
idle wrote: Tue Jan 14, 2025 9:15 pm you probably won't need a dB for that if it's all time stamped in increasing order just use a sliding windows and seek through the file loading into an array at what ever sample rate you want and that way you can do fft's to see any periodic cycles and correlate the lag or feed into a Neural network or do PCA principal component analysis
Idle,

you're not the first to suggest that, for this application, a full-grown DBMS might be overkill :wink: I have to test my individual alternatives before deciding on a strategy. A sliding window over the ASCII files would be possible, but I'd like to reduce disk I/O as much as possible. Time for some experiments...

Btw: those 50Gb I mentioned before are zipped ASCII files, so the unpacked size is probably way over sevenfold that size...

Cheers
Skipper

Re: Millisecond time resolution with databases?

Posted: Thu Jan 16, 2025 6:29 am
by idle
Don't think that would be a problem file reads are buffered anyhow but tou could page into a ringbuffer