A new look at SQLite
Probably like many, i used to be looking at SQLite as a nice tool for working on dev or staging apps, and Postgres or MySQL as the solution for production environment. Until I read the excellent article from Avery Pennarun, where SQLite is described as an open source tool with "... quality levels too high to be rational for the market to provide".
Actually, why couldn't SQLite be used for production environment? What if my preconceived idea that SQLite is "too light to be performant" wasn't true? What if SQLite could outperform PGX or MySQL? By performance, I mean both read and write operations per second, but also the disk space needed to store the actual data.
Context
This study is done in the context of the Time-Series database (TSDB) used to store data from waqi.info and aqicn.org - There is a lot of data, and 99% of it is time-series data, which is currently carefully stored on a dual PGX MySQL cluster managed by a custom framework called ATSDB.
This ATSDB framework makes time-series storage automated, handling not only transparent table time slicing, but also string-to-index mapping, data streaming, data re-compression, L1 and L2 caching, as well as backup, plus a few more extras, all, of course, done in a completely automated way.
As we are now in the process of adding full support for PGX to ATSDB, the question that popped-up is, why couldn't we also add support SQLite? To get a rational answer, it was decided to first do a few benchmarks.
Benchmarking
In order to perform the benchmark, we used 2 tables type. The first one contains 2 columns, of type string, indexed to respectively 4 and 1 byte. The second table contains an additional 6 columns of type int encoded in 4 bytes. Note that both table contain an implicit 4 bytes timestamp, automatically added by ATSDB for time-sliced tables.
type Table2Columns struct {
Sensor string `atsdb:"map:uint32"`
Status string `atsdb:"map:uint8"`
}
type Table8Columns struct {
Sensor string `atsdb:"map:uint32"`
Status string `atsdb:"map:uint8"`
Count1 int
Count2 int
Count3 int
Count4 int
Count5 int
Count6 int
}
Write Speed
The test consist in inserting as fast as possible data during a period of 1 minute, using either a MySQL backend, or a SQLite backend. The test platform is a 4 cores Intel i5-7200U CPU running at 2.50GHz.
Here are the results for the insertion speed:
Backend | 2 columns | 8 columns |
---|---|---|
PGX | 19,459 inserts/sec | 17,433 inserts/sec |
MySQL | 17,189 inserts/sec | 15,511 inserts/sec |
SQLite | 187,539 inserts/sec | 111,118 inserts/sec |
Wow, I seriously did not expect this kind of result. SQLite is up to 10 times faster than MySQL and PGX, and this even without any specific optimisation.
Read Speed
Next is the read-back speed. This can be important when having to preload caches that require to get, for instance, the past 8 hours of stored data.
Backend | 2 columns | 8 columns |
---|---|---|
PGX | 790,506 read/sec | 402,582 read/sec |
MySQL | 782,317 read/sec | 326,740 read/sec |
SQLite | 478,080 read/sec | 207,321 read/sec |
This time PGX wins, but only by a factor of 2. And SQLite is already able to achieve almost half million reads/second, which is definitely enough for our use case. Further more, the ATSDB support streaming, so when having to read really huge amount of data way back in time, data is always progressively extracted. So, this SQLite read-back performance is definitely enough.
Storage Efficiency
Last, but not the least, we need to look at the disk usage, since when accumulating data for more than 15 years, no one wants to end-up having to pay an exponentially increasing bill to cloud providers for data-store. In other, the smaller the data store on disk, i.e. the lower the entropy, the better.
The result below are obtained by taking the table size divided by the number of entries stored in the table. In the case of SQLite, it is the file size, while for MySQL, it is both the data and index size obtained from the information schema
metadata. As for PGX, it is the pg_total_relation_size (oid)
from the pg_class c
Backend | 2 columns | 8 columns |
---|---|---|
PGX | 44.5 bytes per entry | 68.5 bytes per entry |
MySQL | 39.8 bytes per entry | 71.2 bytes per entry |
SQLite | 14.1 bytes per entry | 44.6 bytes per entry |
Remember thats ATSDB adds an implicit 4 bytes for the timestamp. So, for the 2 column version, the minimum entropy is 4 (timestamp)+4 (sensor value)+1(status value), i..e 9 bytes per entry, and for the 8 column version, it requires an additional 6*4 bytes, eg a total of 35 bytes. Based, on those number, we can get the DB engine storage overhead, and the lower the better:
Backend | 2 columns | 8 columns |
---|---|---|
PGX | 394% overhead | 96% overhead |
MySQL | 342% overhead | 103% overhead |
SQLite | 57% overhead | 27% overhead |
Yet, to be precise, MySQL has a wonderful feature that allows to change the storage engine. Most commonly, InnoDB is used, but actually, MyISAM is so much more performant in terms of storage efficiency, at the cost of slower inserts speed. So, to be exact, we need to evaluate the storage efficiency comparing both MyISAM and InnoDB against SQLite:
Backend | 2 columns (bytes/entry) | 2 columns (overhead) | 8 columns (bytes/ entry) | 8 columns (overhead) |
---|---|---|---|---|
SQLite | 14.1 | 57% | 44.6 | 27% |
MySQL (MyISAM) | 10.1 | 12% | 35.1 | 0% |
Using MyISAM, MySQL can over perform SQLite, and that makes it a better choice for archiving warm data, eg tables that at mostly read and seldom updated. But for live data, eg tables where most operations are about inserting or updating, and where reads are handling in memory using L1 cache, then SQLite is definitely a very good choice.
As for PGX, I do not have enough knowledge about its internal storage structure to say wether it can be optimised, but this post should be a good starting link. I'll blog about that in a later post.
Conclusion
The results speaks for it, and I must admit I was completely biased and wrong: SQLite is not a "light DB", it is an over performing database!
And when taking into account how easy it is to deploy an SQLite database compared to deploying MySQL or PGX, this makes SQLite the preferred data-store choice for many applications.