I was going to say I expect much of the gain to come from "pragma synchronous=normal", which tells your MBP M1 to not bother committing the data to SSD robustly before telling the application that the insert is committed. From Sqlite docs:
> WAL mode is always consistent with synchronous=NORMAL, but WAL mode does lose durability. A transaction committed in WAL mode with synchronous=NORMAL might roll back following a power loss or system crash.
But then I realised "pragma fullfsync=off" is the default, which is a MacOS-specific pragma.
So, on MacOS only, WAL with synchronous=FULL (the default for that pragma) has the same durability issue as synchronous=NORMAL, that committed transactions might be rolled back following power loss or system crash, albeit with different probabilities.
If you really want to maximize sqlite inserts, prepare a statement with sqlite3_prepare_v3, then in your inner loop call sqlite3_bind_* + sqlite3_step. 100k easily. Unfortunately this interface is not exposed in most language bindings.
A fair number of ORMs will do this for you fwiw, but then you'll spend more CPU in the ORM stack so it is a bit of a tradeoff for micro-benchmarks like this.
If you think about this in latency terms, you are able to insert a row and be done with the entire ceremony in about 12 microseconds. This is serialized throughput too.
I think it is unlikely you would get this kind of performance with a hosted solution across the network. You could cheat with batching & caching, but for a cold, one-record non-query, nothing comes close to the latency of SQLite on a local NVMe device.
I tried the same test on Hetzner VPS (4VCPU AMD EPYC, 8 GB RAM, 160 GB NVMe SSD), and the result is about 20K inserts per sec. It's not 80K, but still exciting, I would even be happy with 10K. I am sure there is a big room for optimizations, but I am planning to use SQLite3 with Python in a very simple way.
If you (more commonly) run on a different host, you’re looking more at 20x-50x and this is assuming you’re in the same region.
To even get close to the SQLite level of throughput you’d have to stagger/pipeline your requests over a number of concurrent connections (proportional to the earlier x).
You’ll eventually succeed at doing just that, with multiple machines. Congratulations, but now you have to also consider the N+1 problem, even for small Ns. In SQLite, you can afford much more back-and-forth with simple queries.
* He’s comparing an in-memory SQLite DB against a non-temp table in Postgres. Of course the RTT is lower when you don’t have to interact with the filesystem or flush to disk in the critical path.
* He’s connecting to Postgres over a loopback interface, instead of using a UNIX socket. Of course the RTT is higher when requests and responses needlessly traverse the TCP/IP stack.
> He’s comparing an in-memory SQLite DB against a non-temp table in Postgres.
That’d surprise me, where does he say that? (Even if he doesn’t it’s important to iron out though, so thanks for being critical!) The ~18us sound reasonable from my own experiments, and that’s with flush to file (using wal mode).
EDIT: I think I ran my benchmarks with synchronous=normal. Not sure what he did.
> He’s connecting to Postgres over a loopback interface, instead of using a UNIX socket.
I agree unix sockets would be better for same host setup, but still, the more representative number for “typical use” would be networked, no?
Also, for fairness: SQLite setups come with downtime, that’s hard to avoid. Worth mentioning..
> That’d surprise me, where does he say that? (Even if he doesn’t it’s important to iron out though, so thanks for being critical!) The ~18us sound reasonable from my own experiments, and that’s with flush to file (using wal mode).
I was looking at the benchmark results at 13:27, but now I see there's another set of benchmark results shown at 29:13. He doesn't link to a repo for that second one, but in-memory vs persistent doesn't matter as much when measuring reads if the results are cached in memory.
> I agree unix sockets would be better for same host setup, but still, the more representative number for "typical use" would be networked, no?
If we're going by what's "representative", then a benchmark isn't useful, because Postgres and SQLite have dissimilar use cases. If you want to compare their performance, it only makes sense to eliminate confounding variables, of which the TCP/IP stack is significant.
> He doesn't link to a repo for that second one, but in-memory vs persistent doesn't matter as much when measuring reads if the results are cached in memory.
No doubt that we deserve reproducible benchmarks and this could use more details. As for caching that’d be the same for Postgres in theory.
> If we're going by what's "representative", then a benchmark isn't useful, because Postgres and SQLite have dissimilar use cases.
I agree and disagree! We already know Postgres is performant enough for a majority of server-side use cases. Ben is trying to show what you can expect from SQLite if you use it for those cases, which is novel to most people. SQLite doesn’t have the same amount of features, and partly because of that drawback, the extremely low latency (in a typical setup) is a redeeming factor. I think it’s not misleading to say SQLite has significantly lower app->db _latency_ than networked databases. Unix socket is a more fair microbench yea, but at least to me that’s not as useful because the Postgres setup wouldn’t be same-host.
The relative performance difference of no-op queries (RTT) is unlikely to have a high impact on application performance since they're so quick in absolute terms.
> Many programs use fopen(), fread(), and fwrite() to create and manage files of data in home-grown formats. SQLite works particularly well as a replacement for these ad hoc data files. Contrary to intuition, SQLite can be faster than the filesystem for reading and writing content to disk.
For a project a couple of years ago, I used Go language and what I did was a very simple thing: I took advantage its channel mechanism and created a slice to save (memoize more or less) 100K generated data which I inserted with a BEGIN TRANSACTION / END; in other words, every 100K I would insert them once and repeat.
If my memory serves me well, in approximately 6.8 seconds (give or take), I could insert 10M rows and bear in mind I didn't use WAL at all.
Were those individual inserts or as a transaction? The latter should significantly bump those numbers up. Also consider doing a prepared statement while you're at it.