A new look at SQLite

Probably like many, i used to be looking at SQLite as a nice tool for working on  dev or staging apps, and Postgres or MySQL as the solution for production environment. Until I read the excellent article from Avery Pennarun, where SQLite is described as an open source tool with "... quality levels too high to be rational for the market to provide".

Actually, why couldn't SQLite be used for production environment? What if my preconceived idea that SQLite is "too light to be performant" wasn't true? What if SQLite could outperform PGX or MySQL? By performance, I mean both read and write operations per second, but also the disk space needed to store the actual data.

Context

This study is done in the context of the Time-Series database (TSDB) used to store data from waqi.info and aqicn.org - There is a lot of data, and 99% of it is time-series data, which is currently carefully stored on a dual PGX MySQL cluster managed by a custom framework called ATSDB.

This ATSDB framework makes time-series storage automated, handling not only transparent table time slicing, but also string-to-index mapping, data streaming,  data re-compression, L1 and L2 caching, as well as backup, plus a few more  extras, all, of course, done in a completely automated way.

As we are now in the process of adding full support for PGX to ATSDB, the question that popped-up is, why couldn't we also add support SQLite? To get a rational answer, it was decided to first do a few benchmarks.

Benchmarking

In order to perform the benchmark, we used 2 tables type. The first one contains 2 columns, of type string, indexed to respectively 4 and 1 byte.  The second table contains an additional 6 columns of type int encoded in 4 bytes. Note that both table contain an implicit 4 bytes timestamp, automatically added by ATSDB for time-sliced tables.

type Table2Columns struct {
	Sensor string `atsdb:"map:uint32"`
	Status string `atsdb:"map:uint8"`
}

type Table8Columns struct {
	Sensor   string `atsdb:"map:uint32"`
	Status   string `atsdb:"map:uint8"`
	Count1   int
	Count2   int
	Count3   int
	Count4   int
	Count5   int
	Count6   int
}

Write Speed

The test consist in inserting as fast as possible data during a period of 1 minute, using either a MySQL backend, or a SQLite backend. The test platform is a 4 cores  Intel i5-7200U CPU running at 2.50GHz.

Here are the results for the insertion speed:

Backend 2 columns 8 columns
PGX 19,459 inserts/sec 17,433 inserts/sec
MySQL 17,189 inserts/sec 15,511 inserts/sec
SQLite 187,539 inserts/sec 111,118 inserts/sec

Wow, I seriously did not expect this kind of result. SQLite is up to 10 times faster than MySQL and PGX, and this even without any specific optimisation.

Read Speed

Next is the read-back speed. This can be important when having to preload caches that require to get, for instance, the past 8 hours of stored data.  

Backend 2 columns 8 columns
PGX 790,506 read/sec 402,582 read/sec
MySQL 782,317 read/sec 326,740 read/sec
SQLite 478,080 read/sec 207,321 read/sec

This time PGX wins, but only by a factor of 2. And SQLite is already able to achieve almost half million reads/second, which is definitely enough for our use case. Further more, the ATSDB support streaming, so when having to read really huge amount of data way back in time, data is always progressively extracted. So, this SQLite read-back performance is definitely enough.

Storage Efficiency

Last, but not the least, we need to look at the disk usage, since when accumulating data for more than 15 years, no one wants to end-up having to pay an exponentially increasing bill to cloud providers for data-store. In other, the smaller the data store on disk, i.e. the lower the entropy, the better.  

The result below are obtained by taking the table size divided by the number of entries stored in the table. In the case of SQLite, it is the file size, while for MySQL, it is both the data and index size obtained from the information schema metadata. As for PGX, it is the pg_total_relation_size (oid) from the pg_class c

Backend 2 columns 8 columns
PGX 44.5 bytes per entry 68.5 bytes per entry
MySQL 39.8 bytes per entry 71.2 bytes per entry
SQLite 14.1 bytes per entry 44.6 bytes per entry

Remember thats ATSDB adds an implicit 4 bytes for the timestamp. So, for the 2 column version, the minimum entropy is 4 (timestamp)+4 (sensor value)+1(status value), i..e 9 bytes per entry, and for the 8 column version, it requires an additional 6*4 bytes, eg a total of 35 bytes. Based, on those number, we can get the DB engine storage overhead, and the lower the better:

Backend 2 columns 8 columns
PGX 394% overhead 96% overhead
MySQL 342% overhead 103% overhead
SQLite 57% overhead 27% overhead

Yet, to be precise, MySQL has a wonderful feature that allows to change the storage engine. Most commonly, InnoDB is used, but actually, MyISAM is so much more performant in terms of storage efficiency, at the cost of slower inserts speed. So, to be exact, we need to evaluate the storage efficiency comparing both MyISAM and InnoDB against SQLite:

Backend 2 columns (bytes/entry) 2 columns (overhead) 8 columns (bytes/ entry) 8 columns (overhead)
SQLite 14.1 57% 44.6 27%
MySQL (MyISAM) 10.1 12% 35.1 0%

Using MyISAM, MySQL can over perform SQLite, and that makes it a better choice for archiving warm data, eg tables that at mostly read and seldom updated. But for live data, eg tables where most operations are about inserting or updating, and where reads are handling in memory using L1 cache, then SQLite is definitely a very good choice.

As for PGX, I do not have enough knowledge about its internal storage structure to say wether it can be optimised, but this post should be a good starting link. I'll blog about that in a later post.

Conclusion

The results speaks for it, and I must admit I was completely biased and wrong: SQLite is not a "light DB",  it is an over performing database!

And when taking into account how easy it is to deploy an SQLite database compared to deploying MySQL or PGX, this makes SQLite the preferred data-store choice for many applications.