A blog post series I've been meaning to write for over 3 years now: \* Every dat...

rkwz · on June 12, 2021

> Should you do it? Probably not. Is it good enough for a theoretical ~70% of the startups out there who really don't shuffle around too much data or need to pretend to do any hyper scaling? Maybe.

It's also useful when you want to quickly build a "good enough" version of a feature like search so you can get it in front of your users fast and iterate on their feedback. Most of the time, they'd be quite happy with the results and you don't have to spend time on something like managing Elasticsearch.

I wrote a post on how you can use postgres to add search capabilities with support for queries like

jaguar speed -car

ipad OR iphone

"chocolate chip" recipe

http://www.sheshbabu.com/posts/minimal-viable-search-using-p...

hardwaresofton · on June 12, 2021

Yup -- I'm a big fan of always writing the interface and the implementation, even if there's only one. You're always glad you wrote `Queue` and `PostgresQueue` when it comes time to write `KafkaQueue` or `NATSQueue`.

That said, I am an unrepentant yak shaver, and there is a lot to be said in just writing those things when you need it but, Postgres would be perfect for rapid prototyping in this way.

osigurdson · on June 12, 2021

I do think this is a product that everyone wants - support all popular models (relational, kvs, queue, log, etc) in a consistent, scalable, open source and easy to operate service. I'm not sure that this is actually possible but I think if such a thing did exist it really would dominate.

In the current reality today, implementing everything in Postgres is probably going to be slower to market (i.e. for a start-up) than using off-the-shelf products. When you do need to scale, this is when you get to learn about how valid your assumptions were in your abstraction layer - mostly likely in production. As a concrete example, Kafka isn't designed to work well with large numbers of topics. Similarly, InfluxDB isn't designed to work well with high cardinality time series. I think it is generally wiser to "skate where the puck is going" in this situation.

Of course, everything is a trade-off. Postgres is incredibly reliable (like insane) and simple to operate. I'd say for any kind of internal line-of-business type application where scalability is less of a concern you really would be doing your ops team a service by implementing everything in Postgres.

des1nderlase · on June 12, 2021

But I don't get it, why would you use PG for all these if specialized systems (and arguably optimized for that use case) already exist?

hardwaresofton · on June 12, 2021

Just repeating what others have said:

- Postgres is probably already running (it's pretty good for OLTP workloads)

- Operational ease and robustness

- Cloud support everywhere for Postgres

- People know how to backup and restore postgres

- Sometimes Postgres will beat or wholly subsume your specialized system and be a good choice

- Postgres has ACID compliance and a very good production-ready grasp on the research level problems involved in transactions. I've never met an etcd/zookeeper cluster I didn't wish was simply a postgres table. Image being able to actually change your cache and your data at the same time and ensure that both changes happen or none of them happen (this is a bit vaporware-y, because locks and access pattern discrepancies and stuff but bear with me). You're much more unlikely to see Postgres fail a Jepsen test[0]

[0]: https://jepsen.io

konschubert · on June 12, 2021

Because you already have a Postgres DB running probably and you know how to back it up, you know how to upgrade it, all your services can already authenticate towards it, your developers can run it locally, you know how to mock it…

GordonS · on June 13, 2021

I wouldn't personally use Postgres for all of these, but have done so successfully multiple times for a decent subset:

- storing relational data (duh)

- storing JSON documents with Postgres' JSONB support - it really is very good, and being able to query relational and document data in the same query is wonderful

- storing key/value type data, where I only need to store a small amount of such data - seems silly to spin up Redis for such a small requirement

- time-series data - TimescaleDB is amazing. Performance may not be on par with a highly tuned schema with a purpose-built time series database, but it's still very, very good. It's fast even with billions of rows, has data compression, and it's really nice to be able to be able to query it just like any other Postgres tables. And the TimescaleDB folks are really helpful on Slack and GitHub. I'm a huge fan of TimescaleDB, and think it's more than adequate for a lot of time-series use cases

- full text search - Postgres shines here too! It's not as powerful as the likes of Elasticsearch, but it's still very good, and very fast. And Elasticsearch is not trivial or cheap to setup and maintain

For queues and pub/sub, RabbitMQ is my go-to solution.

jpalomaki · on June 12, 2021

One practical thing is that consistent backups can become very difficult if you distribute your state to multiple places.

cnorthwood · on June 12, 2021

Operational ease