Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

A blog post series I've been meaning to write for over 3 years now:

* Every database a Postgres 1: Key/Value store

* Every database a Postgres 2: Document stores

* Every database a Postgres 3: Logs (Kafka-esque)

* Every database a Postgres 4: Timeseries

* Every database a Postgres 5: Full Text Search

* Every database a Postgres 6: Message Queues

Low key, you could make almost every single type of database a modern startup needs out of Postgres, and get the benefits (and drawbacks) of Postgres everywhere.

Should you do it? Probably not. Is it good enough for a theoretical ~70% of the startups out there who really don't shuffle around too much data or need to pretend to do any hyper scaling? Maybe.

If anyone from 2ndQuadrant/Citus/EDB see this, please do a series like this, make the solutions open source, and I bet we'd get some pretty decent performance out of Postgres compared to the purpose built solutions (remember, TimescaleDB did amazing compared to InfluxDB, a purpose built tool, not too long ago).

New features like custom table access methods and stuff also shift the capabilities of Postgres a ton. I'm fairly certain I could write a table access method that "just" allocated some memory and gave it to a redis subprocess (or even a compiled-in version) to use.

[EDIT] - It's not clear but the listing is in emacs org mode, those bullet points are expandable and I have tons of notes in each one of these (ex. time series has lots of activity in postgres -- TimescaleDB, native partitioning, Citus, etc). Unfortunately the first bullet point is 43 (!) bullet points down. If someone wants to fund my yak shaving reach out, otherwise someone signal boost this to 2Q/Citus/EDB so professionals can take a stab at it.

[EDIT2] - I forgot some, Postgres actually has:

- Graph support, w/ AgensGraph now known as AGE[0]

- OLAP workloads with Citus Columnar[1] (and zedstore[2]).

[0]: https://age.apache.org

[1]: https://www.citusdata.com/blog/2021/03/05/citus-10-release-o...

[2]: https://github.com/greenplum-db/postgres/tree/zedstore



> Should you do it? Probably not. Is it good enough for a theoretical ~70% of the startups out there who really don't shuffle around too much data or need to pretend to do any hyper scaling? Maybe.

It's also useful when you want to quickly build a "good enough" version of a feature like search so you can get it in front of your users fast and iterate on their feedback. Most of the time, they'd be quite happy with the results and you don't have to spend time on something like managing Elasticsearch.

I wrote a post on how you can use postgres to add search capabilities with support for queries like

jaguar speed -car

ipad OR iphone

"chocolate chip" recipe

http://www.sheshbabu.com/posts/minimal-viable-search-using-p...


Yup -- I'm a big fan of always writing the interface and the implementation, even if there's only one. You're always glad you wrote `Queue` and `PostgresQueue` when it comes time to write `KafkaQueue` or `NATSQueue`.

That said, I am an unrepentant yak shaver, and there is a lot to be said in just writing those things when you need it but, Postgres would be perfect for rapid prototyping in this way.


I do think this is a product that everyone wants - support all popular models (relational, kvs, queue, log, etc) in a consistent, scalable, open source and easy to operate service. I'm not sure that this is actually possible but I think if such a thing did exist it really would dominate.

In the current reality today, implementing everything in Postgres is probably going to be slower to market (i.e. for a start-up) than using off-the-shelf products. When you do need to scale, this is when you get to learn about how valid your assumptions were in your abstraction layer - mostly likely in production. As a concrete example, Kafka isn't designed to work well with large numbers of topics. Similarly, InfluxDB isn't designed to work well with high cardinality time series. I think it is generally wiser to "skate where the puck is going" in this situation.

Of course, everything is a trade-off. Postgres is incredibly reliable (like insane) and simple to operate. I'd say for any kind of internal line-of-business type application where scalability is less of a concern you really would be doing your ops team a service by implementing everything in Postgres.


But I don't get it, why would you use PG for all these if specialized systems (and arguably optimized for that use case) already exist?


Just repeating what others have said:

- Postgres is probably already running (it's pretty good for OLTP workloads)

- Operational ease and robustness

- Cloud support everywhere for Postgres

- People know how to backup and restore postgres

- Sometimes Postgres will beat or wholly subsume your specialized system and be a good choice

- Postgres has ACID compliance and a very good production-ready grasp on the research level problems involved in transactions. I've never met an etcd/zookeeper cluster I didn't wish was simply a postgres table. Image being able to actually change your cache and your data at the same time and ensure that both changes happen or none of them happen (this is a bit vaporware-y, because locks and access pattern discrepancies and stuff but bear with me). You're much more unlikely to see Postgres fail a Jepsen test[0]

[0]: https://jepsen.io


Because you already have a Postgres DB running probably and you know how to back it up, you know how to upgrade it, all your services can already authenticate towards it, your developers can run it locally, you know how to mock it…


I wouldn't personally use Postgres for all of these, but have done so successfully multiple times for a decent subset:

- storing relational data (duh)

- storing JSON documents with Postgres' JSONB support - it really is very good, and being able to query relational and document data in the same query is wonderful

- storing key/value type data, where I only need to store a small amount of such data - seems silly to spin up Redis for such a small requirement

- time-series data - TimescaleDB is amazing. Performance may not be on par with a highly tuned schema with a purpose-built time series database, but it's still very, very good. It's fast even with billions of rows, has data compression, and it's really nice to be able to be able to query it just like any other Postgres tables. And the TimescaleDB folks are really helpful on Slack and GitHub. I'm a huge fan of TimescaleDB, and think it's more than adequate for a lot of time-series use cases

- full text search - Postgres shines here too! It's not as powerful as the likes of Elasticsearch, but it's still very good, and very fast. And Elasticsearch is not trivial or cheap to setup and maintain

For queues and pub/sub, RabbitMQ is my go-to solution.


One practical thing is that consistent backups can become very difficult if you distribute your state to multiple places.


Operational ease




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: