Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

1. Automatic backups to an S3 compatible storage, out of the box.

2. Progressive automatic scalability. As load increases or storage runs out, the DB should be able to automatically. NewSQL databases do this already.

3. Tiered storage.

4. Support streaming, stream processing, in memory data structures, etc. I feel like this is one of those weird things but I keep wishing this were possible when I work on side projects or startups. I don't want to have to spin up mysql/postgres, kafka/pulsar, flink/whatever/stream processing, redis, etc separately because that would be prohibitively expense when you're just getting off the ground unless you have VC money. So I find myself wishing I could deploy something that would do all of those things, that could also scale somewhat until the need to break everything into their own infrastructure, and if it was wire compatible with popular projects then that would be perfect. Will it happen? I doubt it, but it would be lovely assuming it worked well.



> 3. Tiered storage.

Do you mean tablespaces? https://www.postgresql.org/docs/10/manage-ag-tablespaces.htm...

They'll allow you to have some parts on database on faster devices for instance.


That does seem neat, but I was actually thinking more like off-loading cold data to S3, or some such. Maybe even having distinctions between (ultra) hot, warm, cold data.


Very cool - I had no idea this existed. I wonder if this could be useful for using ramdisks when working with temp tables.


Very possible.

I learned about the possibility a good decade ago when working on Oracle. For high performance use we used to have two areas:

- one small SSD-backed area where new inserts happened (Server class SSD was new and relatively expensive back then.)

- one (or more) areas for permanent storage, based on ordinary disks with good storage/price

Then we had processes that processed data from the SSD Pool and wrote them to long term storage asynchronously.

Edit: I should say I was very happy to find the same feature on Postgres, Oracle is a real hassle in more than one way.


yes, ramdisks are great. =) I've used that approach to replace redis with a faster version on postgres. =)


> 4. Support streaming

I feel the same. Streaming is so lacking in most dbs today. RethinkDB's `.changes()` was really cool. I wonder if PSQL will eventually do it, or whether a new DB will take over. Everyone went to Mongo then ran back to PSQL, but maybe after lessons-learned there is room for a new db optimized for in-memory usage and with great first-class streaming support.


MongoDB has supported a streaming API since version 3.6

https://docs.mongodb.com/manual/changeStreams/


But not relational...


Great list. For #4, you should take a look at Erlang or Elixir - they have "just enough" concurrency with streaming primitives and a functional style that makes it easy to use. They make a lot of "bolt on" stuff you need for a regular startup (Redis, Celery, etc) superfluous.


Could you briefly elaborate on this? Are you suggesting the right structures within Elixir/Erlang are both concurrent and safe enough to negate the need for these things at some level? (And are you referring to things like OTP, or more general than that?)

EDIT: For context, I'm familiar with the concurrency model and with how fairly bulletproof "processes" are in their context, but had never considered putting these to use in lieu of a Redis cache or certain other datastore use-cases. (My brief foray into Elixir was, however, when looking to improve reliability of a high-volume messaging system and various task queues attached to it, so that use-case I am at least aware of)


I wouldn't say Elixir is highly concurrent in the sense that you'll get something like a CRDT out of the box. You'll still have to design all of your distribution logic.

I'm saying that the tools provided in that toolbox - like processes, OTP and ETS - let you build in-memory or disk backed structures that live inside the concern of the main application and are more responsive to your specific needs.

For instance, let's say you need a scoreboard that persists in between page refreshes.

With Node.js, you're left in a "hmm - not sure" space where you need to cobble together all the different pieces. At the end of the day you'll be facing cache consistency issues, problems saving state, abstraction leaks and communication overhead.

In Elixir, it's trivial to use pubsub to collect events, a GenServer to store that state or act as a broker, and ETS to perform simple queries against it. It's like you're still given just a box of tools, but all of the tools work together better for dynamic, live, complex applications.

Your mileage may vary! Let me know if I can help.


I am actually super familiar with Erlang and Elixir and I agree with your assessment, but with the caveat that if you go that route you end up with your own hand-rolled solution that won't be wire/API compatible with the most used OS projects. Or in the case of Erlang/Elixir you'll be running it in process! But it is great for that reason.


i mean it took me just a couple hours to write script to backup postgres db to s3. not a lot of work


And it's work that gets replicated again and again




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: