1. Automatic backups to an S3 compatible storage, out of the box.
2. Progressive automatic scalability. As load increases or storage runs out, the DB should be able to automatically. NewSQL databases do this already.
3. Tiered storage.
4. Support streaming, stream processing, in memory data structures, etc. I feel like this is one of those weird things but I keep wishing this were possible when I work on side projects or startups. I don't want to have to spin up mysql/postgres, kafka/pulsar, flink/whatever/stream processing, redis, etc separately because that would be prohibitively expense when you're just getting off the ground unless you have VC money. So I find myself wishing I could deploy something that would do all of those things, that could also scale somewhat until the need to break everything into their own infrastructure, and if it was wire compatible with popular projects then that would be perfect. Will it happen? I doubt it, but it would be lovely assuming it worked well.
That does seem neat, but I was actually thinking more like off-loading cold data to S3, or some such. Maybe even having distinctions between (ultra) hot, warm, cold data.
I feel the same. Streaming is so lacking in most dbs today. RethinkDB's `.changes()` was really cool. I wonder if PSQL will eventually do it, or whether a new DB will take over. Everyone went to Mongo then ran back to PSQL, but maybe after lessons-learned there is room for a new db optimized for in-memory usage and with great first-class streaming support.
Great list. For #4, you should take a look at Erlang or Elixir - they have "just enough" concurrency with streaming primitives and a functional style that makes it easy to use. They make a lot of "bolt on" stuff you need for a regular startup (Redis, Celery, etc) superfluous.
Could you briefly elaborate on this? Are you suggesting the right structures within Elixir/Erlang are both concurrent and safe enough to negate the need for these things at some level? (And are you referring to things like OTP, or more general than that?)
EDIT: For context, I'm familiar with the concurrency model and with how fairly bulletproof "processes" are in their context, but had never considered putting these to use in lieu of a Redis cache or certain other datastore use-cases. (My brief foray into Elixir was, however, when looking to improve reliability of a high-volume messaging system and various task queues attached to it, so that use-case I am at least aware of)
I wouldn't say Elixir is highly concurrent in the sense that you'll get something like a CRDT out of the box. You'll still have to design all of your distribution logic.
I'm saying that the tools provided in that toolbox - like processes, OTP and ETS - let you build in-memory or disk backed structures that live inside the concern of the main application and are more responsive to your specific needs.
For instance, let's say you need a scoreboard that persists in between page refreshes.
With Node.js, you're left in a "hmm - not sure" space where you need to cobble together all the different pieces. At the end of the day you'll be facing cache consistency issues, problems saving state, abstraction leaks and communication overhead.
In Elixir, it's trivial to use pubsub to collect events, a GenServer to store that state or act as a broker, and ETS to perform simple queries against it. It's like you're still given just a box of tools, but all of the tools work together better for dynamic, live, complex applications.
I am actually super familiar with Erlang and Elixir and I agree with your assessment, but with the caveat that if you go that route you end up with your own hand-rolled solution that won't be wire/API compatible with the most used OS projects. Or in the case of Erlang/Elixir you'll be running it in process! But it is great for that reason.
2. Progressive automatic scalability. As load increases or storage runs out, the DB should be able to automatically. NewSQL databases do this already.
3. Tiered storage.
4. Support streaming, stream processing, in memory data structures, etc. I feel like this is one of those weird things but I keep wishing this were possible when I work on side projects or startups. I don't want to have to spin up mysql/postgres, kafka/pulsar, flink/whatever/stream processing, redis, etc separately because that would be prohibitively expense when you're just getting off the ground unless you have VC money. So I find myself wishing I could deploy something that would do all of those things, that could also scale somewhat until the need to break everything into their own infrastructure, and if it was wire compatible with popular projects then that would be perfect. Will it happen? I doubt it, but it would be lovely assuming it worked well.