should someone literally pull the plug on your Postgres database server, your data is 100% safe, except for unlogged tables, in which your data is 100% gone - by design!
This omits a crucial fact: your data is not gone until Postgresql has gone through its startup recovery phase. If you really need to recover unlogged table data (to which every proper database administrator will rightly say "then WhyTH did you make it unlogged"), you should capture the table files before starting the database server again. And then pay through the nose for a data recovery specialist.
However, a crash will truncate it.
So this isn't exactly true. A crash recovery will truncate it.
Unlogged table data commonly won't even have been written to disk. And what is on disk is completely inconsistent after a crash. So really, the data is gone once postgres crashed (itself, due to hw issues, kill -9, ...)
They do have backing files - after all, unlogged tables can end up many times the size of memory. We need to reserve space for them, so there's a very high likelihood we could evict the buffers / shut down. But they're just written out due to buffer pressure or shutdown checkpoints (+things like slowing the tablespace).
A quick scan of the postgresql manual turns up nothing. I would have expected the steps performed during startup recovery to be documented there. Neither is the manual explicit about data file usage for an unlogged table, there's just these two snippets:
> The property of certain relations that the changes to them are not reflected in the WAL. This disables replication and crash recovery for these relations.
Which both say nothing about the normal data files underlying unlogged tables, so none of what I wrote can be found in the official docs (or maybe it can, just not by me ;)
> it would be nice to have in-memory tables if they would perform faster. Somebody could object that PostgreSQL is on-disk database which shouldn't utilize in-memory storage. But users would be interested in such storage engine if it would give serious performance advantages.
At a high level, there are some things that coordinate when transactions happen in terms of WAL logging:
1. The Buffer Pool (memory for pages in the database)
2. The Log Manager
3. The Transaction Manager
4. The Storage Manager (read/write to disk)
5. Accessor Methods (interpret page bytes as e.g. Heap or BTree)
This is abstract, not particular to Postgres, which doesn't have exact such names for all above things.
Normally, when the Transaction Manager creates transactions that modify records/tuples (using the Accessor Methods) these actions need to be persisted via the Log Manager to the WAL.
The pages of memory backing these records come from the Buffer Pool, and the Buffer Pool must also log certain actions.
Before the Buffer Pool can flush any modified page to disk, the changes up to that page must have been persisted by the WAL via the Storage Manager as well.
When you create unlogged tables, none of this happens, and when you modify records there's no trail.
> Pointing out that is merely queued for deletion...
ok, so before deletion occurs, 1) how are you going to get it back and 2) how do you know how much you have/haven't lost (that transactional integrity I mentioned)?
I understand what you're saying and I think we're both right, except that we're looking from different angles: you're thinking of it as bits on a disc and I'm looking at it as if I were running my business on that database.
Edit: put another way, you: "Yep, I think we can get something back". Me: "aaaaaaaaaaaargh! mah payroll!"
Very well written and detailed article, with the caveat that they never mention a use case they consider legitimate. Does anyone here have any uses? I could imagine some sort of ETL type tasks which are transient could make sense. Thoughts?
I use them for importing from files to tables. Run some cleanup and validation, before moving the data to its final destination. If pg crashes halfway, I still have the file imported and restart from zero.
A second use is importing httpd combined logs and doing some analysis when finding out which chains of http calls caused some kinds of behaviour. When done, the tables get deleted. It allows easy ad hock queries, correlating with monitoring tables, indexing... I used to write some python scripts, but pg did better than expected, and this way of working stuck somehow.
I've been using these for some larger analytical queries that run in batch fashion (typically something that's done like once or couple of times per day).
I find it hard to get good performance from large queries with CTEs. Often, it's much easier (and uglier) to just split them to multiple steps, create intermediate tables for each step and add necessary indexes.
Temporary tables are then of course even better, since usually you don't want to have these left around. But temporary tables are also unlogged.
Which versions of Postgres have you been running your large CTE based queries with?
Up until Postgres 12, all CTE were materialised before their results were used, effectively making every CTE a temporary table. In Postgres 12 onwards, for side-effect free CTE that are only referenced once, Postgres will take constrains from the parent query, and push them into the CTE, reducing the amount of data they need process, and allowing better use of indexes.
OOI have you tried converting your CTEs into sub-queries as an experiment to see if they’re faster? Even in Postgres 12 onwards, sub-queries and CTEs are treated the same by the query planner, and you can get some surprising differences in performance.
I can't speak to Parent's situation, but temp tables are common practice in OLAP workloads across many database engines (personal experience with Postgres, Presto, Spark, Netezza).
At a certain level of query complexity, no query planner is able to accurately predict the characteristics of sub-queries or CTE, resulting in query plans that are ill-suited to the problem. More often than not, materializing CTEs inside a large query into 2-3 temporary tables results in order-magnitude performance as the database now knows exactly how many rows its dealing with, stats on # of nulls, etc.
Certainly don’t disagree, I’m well acquainted with the limits of the query planner. Especially when dealing with high complexity queries.
My broader point is that CTEs in Postgres are more nuanced than they appear at first blush. They’re frequently presented as simply a method of writing cleaner, more readable SQL. The fact that CTEs receive special treatment by the planner is often missed, along with the fact that two functionally identical queries, one using sub-queries, the other CTEs, can have wildly different performance characteristics.
Does COPY FROM sidestep the WAL? My (perhaps incorrect) understanding is that pg writes such data to a table file then uses a small transaction to make it live.
It does not, except if you're COPYing into an unlogged table - for regular tables data that gets COPYed in gets WAL logged and is crash safe.
In my understanding, as compared to a multi-row insert, a COPY statement is (1) a lot more efficient to parse/process, (2) uses a special bulk write strategy that avoids churn and large amount of dirty pages in shared buffers.
As compared to multiple single row inserts another benefit is that you're writing a single transaction ID, instead of a lot of individual transaction IDs (causing less freezing work having to be done by autovacuum, etc.)
We also batch the insertion into the table and the WAL logging, if possible. There's cases where that optimization has to be disabled though (presence of some types of triggers, for example).
I’ve used it for bulk loading on time series data partitioned by date.
It’s a few years ago so details are blurry. We got daily snapshots and on ingest we created an unlogged temporary table then when complete used INSERT FROM to the destination table if I remember right.
As long as the application handles failure on bulk ingest and you copy the data to a logged table or enable logging on the table once ingested it worked really well.
Probably not, unless you don’t want to keep the data long term.
The initial load might be faster, but you’ll end up repaying that time saving when you turn on logging back on, and Postgres is then forced to read the contents of the table and push it into the WAL anyway. Thereby throwing away any saving you originally made by avoiding the WAL with an unlogged table.
It can be useful to bulk load data into a temporary table, do validations (which may involve joins to your permanent tables so it's convenient to be able to do in the database vs the application) there, and then once everything is validated, copy into your real tables from the temporary table. If you have a process like that, it could make sense to have the temporary table be unlogged (assuming the copy to real tables is still safe if you do that).
I am not saying it does actually work this way, but there is no reason a bulk import needs to touch the WAL at all during import. As long as a lock was held the entire time during an import on the table, the import procedure can simply write to the WAL that this table is consistent as it is on-disk after flushing, flush the WAL, then release the lock on the table.
No it can’t, because doing that would break live backups and replication, which both depend on capturing the WAL and assume it’s a total representation of all data managed by the DB. Any data that isn’t written to the WAL in its entirety can’t be replicated, and can’t be guaranteed to be backed up correctly.
If you’re willing to throw away that functionality, then maybe you could avoid writing the rows to the WAL completely (but it seems Postgres doesn’t support that). But personally I would always be a little suspect of data stored in Postgres that never transited the WAL. It’s a core part of how Postgres provides it durability guarantees, and lots of tools and functionality assume that everything worth protecting gets written to it at least once.
can we have "best of both worlds" for inserting a lot of data quickly, to have an unlogged table for performance and a trigger to copy data to another regular table for permanence?
I'd think so, if there is some asynchronous processing involved that requires lots of intermediate bouncing to disk: do the input queue on logged tables, the WIP states on unlogged ones and when they are done push to output queue (again logged) and remove from the input in the same transaction.
This omits a crucial fact: your data is not gone until Postgresql has gone through its startup recovery phase. If you really need to recover unlogged table data (to which every proper database administrator will rightly say "then WhyTH did you make it unlogged"), you should capture the table files before starting the database server again. And then pay through the nose for a data recovery specialist.
However, a crash will truncate it.
So this isn't exactly true. A crash recovery will truncate it.