Does your data model change often? PostgreSQL and MySQL well be very difficult t...

traceroute66 · on Sept 5, 2021

> Does your data model change often? PostgreSQL and MySQL well be very difficult to make fundamental changes.

I hear this argument a lot and I struggle with it.

It is an argument that, at least for me, falls into the same category as "you should design your schema in a portable manner".

The "portable schema" argument is easy to disprove because if you don't design your schema in accordance with the features available in your database then you are setting yourself up for a big performance fail (e.g. for those of you familiar with the work of Tom Kyte of Oracle, a party trick of his was detailed evidence-based demonstrations of why you should use Oracle features in your schema design vs generic schemas ... but the same applies in the open-source world, e.g. Postgres[1]).

The problem I have with the "use noSQL because your data model changes often" is that you then become heavily reliant on your upstream devs who are coding the app layer to behave themselves because you are no longer in a position to enforce or validate their actions at database layer. It also potentially puts you at risk of loosing the database's position as "source of truth" - because if upstream can change your data model at a whim, it means you could easily loose visibility of data elements overnight.

To me, relational databases will always have a place in the world and I don't think people should blindly follow alternative models just because its the bandwagon of the day. ACID compliance is, AFAIK, not available anywhere else other than an RDBMS setting.

NoSQL, graph databases etc. also have a place in the world of course, but only if you understand the limitations and tradeoffs you are accepting. It is quite possible that many people would be better off with RDBMS.

[1] https://wiki.postgresql.org/wiki/Don%27t_Do_This

vaughan · on Sept 6, 2021

Wanting to make changes to your schema easier, doesn't necessarily mean you need NoSQL.

I think it's a question of having the right tooling.

It would be cool if you could dump an unstructured blob of data in a psql table. And then later you can add a schema to this defining relationships. So like `post.comments` is an array of `Comment`. And then you just run a command that runs a migration that normalizes the data into a `comment` table. And then it would map `post.comments` to a join. Although psql's jsonb support and indexing is pretty good.

The difficulty though is that if you change your mind it becomes much harder to change because now you have multiple tables and relationships. So what would be nice is if you can go back from the relational model to the unstructured model with ease.

I think what is needed is a visual tool to design your db schema and migrations that just works, and is also aware of unstructured data and that it can have a json schema.

Ideally we want to be designing a logical schema (and any unstructured data would implicitly be given a logical schema too), and the physical schema is automatically created.

hpeinar · on Sept 5, 2021

As some of the commenters here have already voiced, I also struggle a bit to understand the issue with using relational databases if the data is actually relational. Which, for probably like 95% projects it is.

The argument that "it will slow us down" seems so vague, I can't really see how defining the schemas and relationships slow you down. You need to know what data you are dealing with anyway even when prototyping, how else would you create views for the users? Tables? Are profile fields for users also open ended? I can add any number of fields with random values?

And "Your database doesn't do much for you?" So what exactly does NoSQL database do for me that PostgreSQL (or MySQL) won't? I'm really curious, I'm not attacking anyone, I just want to know, maybe I've been using the wrong database my whole life.

But I think that database as anything else in your project should be selected by best fit not by mere "it's a cool buzzword, lets use that". Most of the projects out there have relational data and should be using relational database because it fits the core data model.

I don't have anything against NoSQL databases, they obviously have their usage and place but I have against choosing them because they are "cool" and "fast to prototype on" if the project itself has 100% relational data.

Also the tools fastest to prototype with are the ones you know best, it doesn't mean that tool is best fit for the project.

vaughan · on Sept 6, 2021

I'm not sure OC is asking for NoSQL - just a better experience with SQL migrations.

The relational model always requires a judgement call about how normalized to go. If I have a deep nested json object of data, you could argue to define all the relations and normalize. But then to query the data you now have an 6-way join and the query planner starts to struggle a bit. And then if you decide to change that structure, you now have a very tricky migration script to run.

Whereas if you just kept it as a json object, you have avoided a lot of unnecessary pain, and you can just write some quick re-mapping code in the application layer to handle the older version of data, and you can actually start storing data in the new schema immediately. Then you can scan over the collection and modify the existing data of the old schema, and remove your re-mapping layer.

So yes, your data has been relational the whole time, but the query complexity and migration complexity increased dramatically. It became very difficult to change the model.

The relational model is essentially a constraint on how you need to structure your data in order to allow relational algebra to be used to help optimize query plans. When you release these constraints, you find things like Datalog that provide purer ways to represent your data and relationships, in the sense that they can map closer to the real-world - but then this comes at the cost of automated query optimization.

So the fact that relational data models are usually less representative of the real-world because of adherence to the relational model, usually denormalized to optimize the underlying database engine, and difficult to migrate, create complexity and slow down shipping some features (although they can also speed up many others greatly - like analytics).

That being said, SQL dbs are probably the best dbs we have today for solving business problems, and I think the migration and modeling problems could be solved with better tooling.

throwawayboise · on Sept 5, 2021

> You need to know what data you are dealing with anyway

Exactly. At some point, somewhere, you need to understand your data and how different attributes or objects relate to one another. You can do this in a variety of ways, but if you don't understand this you don't understand what you are building.

fny · on Sept 5, 2021

This is no longer a good reason since Postgres and MySQL support JSON with indexing.

Also, you don't actually need to define relationships. Rails for eons never did this at the database level.

smitty1e · on Sept 5, 2021

PostGreSQL provides JSON functions if your data have more of a document feel to them.

https://www.postgresql.org/docs/9.5/functions-json.html

axegon_ · on Sept 5, 2021

Postgres was first to the party but MySQL supports that too since 5.6 or 5.7, I can't remember.