Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Personally, this kind of thing actually gives me _more_ confidence in Postgres rather than less. The core team's responsiveness to this bug report was incredibly impressive.

Around June 4th, the article's author comes in with a bug report that basically says "I hammered Postgres with a whole bunch of artificial load and made something happen" [1].

By the 8th, a preliminary patch is ready for review [2]. That includes all the time to get the author's testing bootstrap up and running, reproduce, diagnose the bug (which, lest us forget, is the part of all of this that is actually hard), and assemble a fix. It's worth noting that it's no one's job per se on the Postgres project of fix this kind of thing — the hope is that someone will take interest, step up, and find a solution — and as unlikely as that sounds to work in most environments, amazingly, it usually does for Postgres.

Of note to the hacker types here, Peter Geoghegan was able to track the bug down through the use of rr [4] [5], which allowed an entire problematic run to be captured, and then stepped through forwards _and_ backwards (the latter being the key for not having to run the simulation over and over again) until the problematic code was identified and a fix could be developed.

---

[1] https://www.postgresql.org/message-id/CAH2-Wzm9kNAK0cbzGAvDt...

[2] https://www.postgresql.org/message-id/CAH2-Wzk%2BFHVJvSS9VPP...

[3] https://www.postgresql.org/message-id/CAH2-WznTb6-0fjW4WPzNQ...

[4] https://en.wikipedia.org/wiki/Rr_(debugging)

[5] https://www.postgresql.org/message-id/CAH2-WznTb6-0fjW4WPzNQ...



Yeah, the PostgreSQL team really knocked it out of the park on this one. It was a pleasure working with them. :)


To be fair, you have a great batting average in identifying issues to allow for improvement. Thanks for your work


With distributed and multicore being the path forward with the end of Moores law, your work has been instrumental in helping open source distributed systems improve.

Since distributed systems are so difficult and complicated, it enables salespeople and zealots to both deny issues and overstate capability.

Your work is a shining star in that darkness. Thank you.


For those not entirely sure, this is Kyle Kingsbury, the author of the main article.


Thank you for this comment that gives credit where it's due, this is a very impressive set of threads to read through.

And I agree. For me, one of the most important measures of the reliability of a system is how that system responds too information that it might be wrong. If the response is defensiveness, evasiveness, or persuasive in any way, i.e. of the "it's not that bad" variety, run for the hills. This, on the other hand is technical, validating, and prompt.

Every system has bugs, but depending on these cultural features, not every system is capable of systematically removing those bugs. With logs like these, the pg community continues to prove capable. Kudos!


>If the response is defensiveness, evasiveness, or persuasive in any way, i.e. of the "it's not that bad" variety, run for the hills. This, on the other hand is technical, validating, and prompt.

This resonates with me with teams inside the company as well.

We have a few teams that just deflect issues. Find any issue in the bug report, be it an FQDN in a log search, and poof it goes. Back to sender, don't care. Engineers in my team just don't care to report bugs there anymore, regardless how simple. Usually, it's faster and less frustrating to just work around it or ignore it. You could be fighting windmills for weeks, or just fudge around it.

Other teams, far more receptive with bugs.. engineers end up curious and just poke around until they understand what's up. And then you have bug reports like "Ok, if I create these 57 things over here, and toggle thing 32 to off, and then toggle 2 things indexed by prime numbers on, then my database connection fails. I've reproduced this from empty VMs. If 32 is on, I need to toggle two perfect squares on, but not 3". And then a lot of things just get fixed.


If you reject the "it's not that bad" response then you are saying all bugs are equal in severity / urgency / priority.

Which is obviously not true.

A better response might be something like "I understand the problem is X however the fix might take Y months and the next release will deprecate that feature anyway. Can you quantify or guesstimate the impact on our product if we allow this bug to stay unfixed?"


Yeah, in retrospect, I disagree with myself, too. (and agre with your rationale!) :)

I think if I were to try and salvage the quote, I'd change it to "It's probably not that bad". In this telling, I'm meaning to imply that the responder has made an apriori value judgement about a bug without determining scope and cause.

I agree with your ideal response. The only caveat I'd personally make, is that I'd recommend "planting the flag" by providing your own guesstimate of product impact, and ask the reporter for their opinion on moving it.

In my perspective, it's important to acknowledge that is has an impact, and it's especially important to convey your sense of impact concretely with some napkin math. If you're terribly wrong, the reporter will try to convince you of it[1], and you'll learn something along the way. It also conveys empathy (i.e. we're both trying to solve your problem, even if it's only a problem for you).

[1]: https://meta.wikimedia.org/wiki/Cunningham%27s_Law


Practically everyone in distributed systems knows who Aphyr is, and to get a bug reported by him is like a badge of honor. I am willing to bet that people at PostgrSQL jousted over who got to work with him on this bug, because it would be such an amazing learning opportunity. I know I would want to!


What are the storage requirements for using rr for intense or longer debugging sessions?


The exact recording in question was about 125MB, and that was after I materialized it using "rr pack".

I'd say that the storage overhead is unlikely to be a concern in almost all cases. It's just something that you need to keep an eye on.


this paper describes rr, which for context was designed to be used on commodity hardware: https://arxiv.org/pdf/1610.02144.pdf

section 4.4 talks about disk requirements:

> Memory-mapped files are almost entirely just the executables and libraries loaded by tracees. As long as the original files don’t change and are not removed, which is usually true in practice, their clones take up no additional space and require no data writes

> Like cloned files, cloned file blocks do not consume space as long as the underlying data they’re cloned from persists.

they conclude the section with:

> In any case, in real-world usage trace storage has not been a concern

I imagine that over "longer debugging sessions" the metadata footprint would expand linearly, but probably with a constant smaller than the logs for the average program.


Indeed — it's great to see a vendor (team, in this case) that doesn't try to downplay a Jepsen result, and instead fixes the issues.

However, there is one more takeaway here. I've heard too many times "just use Postgres", repeated as an unthinking mantra. But there are no obvious solutions in the complex world of databases. And this isn't even a multi-node scenario!


> there is one more takeaway here

I don't think the "just use Postgres" mantra takes any hits at all from this. (If anything, I feel better about it).

I've used maybe a dozen (?) databases/stores over the years - graph databases, NoSQL databases, KV stores, the most boring old databases, the sexiest new databases - and my general approach is now to just use Postgres unless it really, really doesn't fit. Works great for me.


I too have been there, done that when it comes to persistent storage, but my similar conclusion is “use SQLite unless it really, really doesn’t fit, in which case, use PostgreSQL.” (It’s usually easy to figure out which of the two you’ll end up needing.)


All the answers to my post are missing the point.

I'm happy Postgres works for you. It works for me, too, in a number of setups. But one should never accept advice like "just use Postgres" without thinking and careful consideration. As the Jepsen article above shows.


I have rarely seen people give "just use PostgreSQL" as advice, but rather "just use PostgreSQL unless you have a compelling reason not to". There's a pretty big difference between the two.


"just use PostgreSQL unless you have a compelling reason not to" is just as a bad advice tbh. At work we did not even consider it because no HA support / availability. What do you do exactly when your master is down, your entire backend just stop working? PG is not a silver bulet especially not when you're looking for good managed / HA solution.

There is no such thing as "use x.y.z unless" this statement does not make any sense.


Setting up a HA PostgreSQL setup might not be as easy as with some turnkey solutions where you just deploy three instances and then it (probably) works, but I'd still trust my data to a simple PostgreSQL replication setup over something fancier where the behaviour in failure scenarios is not as straightforward.

In simple cases, what people usually need is what I would call "operational" HA anyway; database servers going down unintended is (hopefully!) not a common occurrence, and even with a super simple 2-node primary/failover asynchronous replication setup, you can operate on the database servers with maybe a few seconds of downtime, which is about the same you'd get from "natively" clustered solutions.

In true failure scenarios, you might get longer downtime (though setting up automatic failover with eg. pacemaker is also possible), but in situations where that ends up truly mattering you likely have the resources to make it work anyway.


The classic high availability setup for postgresql is to have standby instances, usually one, ready to become the new primary if the primary goes down.

https://www.postgresql.org/docs/current/high-availability.ht...

I used CouchDB in a project because the high availability was much better on the paper, but I think postgresql would have been fine after all.

Good managed postgresql options with high availability are in many public clouds.


This won't work for everyone, but if you are on the AWS stack, then using PostgreSQL on Amazon RDS is a compelling option. There are 6 copies of your data, stored on 3 servers and in 3 separate regions and the replication between the nodes is less than 1s (usually around 50ms). I'd agree that a simple HA solution is the most obvious thing lacking for PostgreSQL, but with RDS it's not something you have to worry about anymore.


> PG is not a silver bulet

No one claims it to be, hence the "unless there's a good reason not to" part. However, it is a good default that works well for a wide range of – though hardly all – use cases.

Replication in general is not PostgreSQL's strongest point and if you require advanced replication then PostgreSQL may indeed not be the choice for you, but most projects don't need this. That being said, "no HA support" is just not true.


"Use PostGres until you have an engineering - data driven rationale not to" is my standard answer for non-blob data storage when a project starts.

Why? because when `n` is small (tables, rows, connections), postgres works well enough, and if `n` should ever become large, we'll have interest in funding the work to do a migration, if that's appropriate - and we'll be able to evaluate the system at scale with real data.


postgres works great for `n` in ranges in excess of (10k, 10B, 1k) at least, which not many would consider small.

The only real justification I've heard is postgres' story on replication still lags mysql, and the write amplification bit which they've done a lot of work on for pg13 (see [0])

Even for many very large companies I've seen, noSQL databases are mostly used as caches that can afford to be delayed or a bit out of sync, not as the database of record.

[0] https://eng.uber.com/postgres-to-mysql-migration/


I mean, you can go a loooong way on Postgres. Things that probably are going to start causing issues:

- the story of replication

- ultra-high volume write contention

- high volume timeseries data streams

For #3 there, I was consulting on a project a few years ago where there was about 1 gigabyte of timeseries data in Postgres/RDS. Some ad hoc queries were really running into slowness - COTS RDS isn't great at speed. I want to say 1 billion rows, but that seems a off. Anyway, we were talking data sharding, etc. Didn't go anywhere, because the project was canceled, etc. But we could have done a nice job if we wanted to.

Point being, there are design limits on postgres, particularly regarding distributed systems. Which is fine. I just have this allergy to designing for scale on Very Small Project.

Also worth noting that the anti-monolith dogma plays out here, a few well built monoliths probably are better for a SQL database than "one bajillion microservices".


I'm totally with you that those three are reasonable limits (hell, I named two in my rebuttal)... Just that scale in itself is not the issue, it has highly specific issues. And for things like high update time series databases, those in my experience are usually metrics databases that say Prometheus serves well, and pg extensions like timescaledb do a stellar job for when you truly need ACID semantics.


Also most webscale use of nosql databases I've seen basically boils down to "eventually consistent materialized views"


"I've heard too many times 'just use Postgres', repeated as an unthinking mantra."

You use "unthinking" pejoratively, but being able to skip past some decisions without over-analyzing is really important. If you are an 8-person startup, you don't have time for 3 of the people to spend weeks discussing and experimenting with alternatives for every decision.

Databases are really important, but people make tons of important decisions based on little information. If you have little information other than "I need to pick a database", then Postgres is a pretty good choice -- a sensible default, if you will.

Everyone wants to be the default, of course, so you need some criteria to choose one. It could be a big and interesting discussion, but there are relatively few contenders. If it's not free, it's probably out. If it's not SQL, it's probably out. If it's not widely used already, it's probably out. If it's tied to any specific environment/platform (e.g. Hadoop), it's probably out (or there will be a default within that specific platform). By "out", I don't mean that it's unreasonable to choose these options, just that they would not make a sensible default. It would be weird if your default was an expensive, niche, no-SQL option that runs inside of Hadoop.

So what's left? Postgres, MySQL, and MariaDB. SQL Server and MongoDB, while not meeting all of the criteria, have a case for being a default choice in some circles, as well. Apparently, out of those options, many on HN have had a good experience with Postgres, so they suggest it as default.

But if you have additional information about your needs that leads you to another choice, go for it.


"Just use Postgres" may have become meme but for good reason and is well grounded IMO.

Many immature databases with not much wide use are better avoided though, we manage to break datomic three times during development, the first two bugs were fixed in a week, the third took a month, which they called in their changelog "Fix: Prevent a rare scenario where retracting non-existent entities could prevent future transactions from succeeding" so yeah, we went back to "just use postgres", who wants to go through the nightmare of hitting those bugs in production and who knows how many more?scary situation.


Thanks for this summary. I take for granted that I have a Postgres, powerful And reliable database that I get to use for free in all my projects and work.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: