Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Schema-less is imho a overrated feature. ORMs like DataMapper (Ruby) and NHibernate (.NET) can generate the schema on the fly for RMDBS, so no need for migrations pre-production. But when your application is in production you need migrations even with a "schema-less" db! See, rename a field and "all your data" is lost, unless you migrate the data from the old field to the new one..


"Schema-less" has the potential (if you use it properly) advantage of allowing gradual migration.

As long as your code can handle all versions of objects in current use, you can deploy new code, then either migrate objects as they're updated/rewritten, and/or slowly migrate objects in the background.

For certain types of schema changes in large enough data stores, this can be a killer feature. I remember one RDBMS setup I had to deal with where we were "stuck" having to do a lot of suboptimal schema changes because the changes we actually wanted to do resulted (based on tests in our dev environment) the system to slow to a crawl where it was unusable for 8+ hours and we just couldn't afford that kind of downtime. We spent a lot of engineering time working our way around something that'd simply be a non-issue in a schema-less system.


"Schemaless" most of the time means "code based schema". Dealing with multiple schema versions at the same time is always possible, relational or not, but it causes significant bloat and complexity. When I hear gradual migration I think code decay, but I can see why it could be useful sometimes.

In my view, schemaless models are only desirable if the schema is not known until runtime, e.g. user specified fields or message structures, external file formats that you don't control but might need to query, etc.


Well, also there is the issue of highly unstructured data. In LedgerSMB, we put it in PostgreSQL along with highly structured data, and just use key-value modelling. These include things like configuration settings for the database in question and the specifics about what a menu item does. I might migrate some of this to hstore in the future (particular the menus).

There are many shortcomings of this approach but when dealing with highly unstructured data (or basically where the inherent structure is that of key/value pairs) it strikes me as the correct approach, and not different really from using NoSQL, XML, or any other non-relational store.


right but was that MySQL ? schema migrations are not a problem on quality systems like Oracle and Postgresql. Altering tables and such doesn't stop the database from running at all.

it's always MySQL's fault in these things.


Fair enough, but you can also have a schemaless store by using JSON fields in PostgreSQL or MySQL.


Not indexably. But you can do a hideous many-tables-per-real-table thing where each field gets a tall thin table in PostGRES or MySQL, do a lot of joins to get your data, and index the fields in that.

It's not as awful as it sounds, performance-wise. It is as awful as it sounds in terms of maintainability, of course.


You can index hstore fields in PostgreSQL.


That's not an unfair comparison at all - indexing the data in a JSON blob is entirely possible and practical.


What you want to index regarding a large text file and what indexes you can create and use may be different.


Even more common is when you have a mature application with a lot of users and you need to add new fields to f.ex the user table and you can't because alter table across a sharded db setup will take days or weeks so you end up creating a table that's a hashtable

key, value

and then proceed to pay the cost of joins against it. Most of my excitement around NoSql comes from hard earned pain not from "oh new shiny thing, I got to use it".


I'll take well-understood pain that I can patiently work around, one time, over the course of days or weeks, if the alternative is random bugs that bite you in the night for years at a time.

Joins are no fun, yes, but as you gritted your teeth and implemented those cute little table-based key-value stores, did you find yourself mentally calculating the time required to restore the whole system from backup while muttering tiny prayers? Probably not. Did your code wake up the ops team an average of once per month for several years? Did you lose data? Did you have to put up an apologetic blog post? Did anyone have to get on the phone and rescue customer accounts, one at a time, with profuse apologies and gifts? (Now that is a non-scalable process...)

But at least this argument about maintenance is a real argument. The one about wanting to save time during initial development by skipping the declaration of schemas reads like the punchline of a Dilbert cartoon that you'd find taped to the wall in the devops lunchroom.


@mechanical_fish yes and it was a mysql installation. Weird things happen with all systems once you push them up to the edge of performance both of the hardware and interconnections between servers.

Slow interconnect between servers caused me headaches in the past with mysql for replication. Shared switched did the same. Problems with locks under high contention did the same. Problems with the client libraries the same. In fact all storage systems have similar problems and pain. Some are just more battle tested than others.


(Disclaimer: I work on ChronicDB)

I second that: schema-less is misunderstood.

There's a difference between flexibility of schema definition and flexibility of schema change[1].

Flexibility of schema change, which NoSQL does not solve, is increasingly more important. Not just for large data stores but also for the data development process and release process. To avoid playing the suboptimal schema-change game both the code and the data need to be updated together. Or at least be given the illusion that they have[2].

A probably obvious question most developers must have asked by now is: if we've built great tools to version source changes, how come we haven't built great tools to version data changes?

[1] - http://chronicdb.com/blogs/nosql_is_technologically_inferior...

[2] - http://chronicdb.com/blogs/change_is_not_the_enemy


See, rename a field and "all your data" is lost, unless you migrate the data from the old field to the new one

This is not true.

I wrote Objectify, a popular third-party java API to App Engine's datastore. The data migration primitives worked out building Objectify are what ScottH built into Morphia, the Java "ORM" system for MongoDB. With a small number of primitives (mostly @AlsoLoad and lifecycle callbacks) it's possible to make significant structure changes on-the-fly with zero downtime.

This is, IMHO, the best thing about schemaless datastores. There's no longer any compelling reason (at least, in the datastore) to take down a system for "scheduled maintenance".

For more information, here is the relevant section of Objectify's documentation:

http://code.google.com/p/objectify-appengine/wiki/Introducti...


ORMs are a pain to use. In addition to know the domain you need to map from and the domain you map to, you now also have to understand the mapping process.


...in exchange for dramatically pared-down and simplified code, consistent data access practices, and hundreds of hours of developer time saved. Driving a car is tough too - how to steer, drivers license, gas, insurance, what a PITA. Yet somehow it remains preferable to walking in many cases, despite the latter being mastered by most two year olds.


If you need all that code to talk to the database I suspect you are in effect using your database as the integration layer. Ouch.


The same should be said for ODMs as well. A document might be a little more straightforward to map to an object but there is still plenty of miss-match.


I'll agree with this. Document stores don't solve the object-relational impedance mismatch, but they do help (and personally, I find they help more than "a little").


ORMs encourage bad database design and little interoperability on the db level.

On short folks build their db around the ORM instead of vice versa.


My main problem with schema migrations was that once you reach 100 million records or so, those tend to lock down the DB server and take quite a while


Let's see. On Pg:

postgres=CREATE TABLE alter_benchmark(id bigint);

CREATE TABLE

postgres=# explain analyze

postgres-# insert into alter_benchmark (id) select * from generate_series(1, 200000000);

postgres=# create temporary table alter_benchmark(id bigint); CREATE TABLE postgres=# explain analyze insert into alter_benchmark (id) select * from generate_series(1, 200000000); QUERY PLAN

-------------------------------------------------------------------------------- ---------------------------------------------------------

Insert (cost=0.00..12.50 rows=1000 width=4) (actual time=1082180.877..1082180. 877 rows=0 loops=1)

   ->  Function Scan on generate_series  (cost=0.00..12.50 rows=1000 width=4) (a
ctual time=87400.737..512954.539 rows=200000000 loops=1) Total runtime: 1086336.466 ms (3 rows)

postgres=# alter table alter_benchmark add test text;

ALTER TABLE

takes insignificant time (less than a second).

I feel so spoiled using PostgreSQL :-D

As I understand it PostgreSQL doesn't rewrite the table to change the column. It might to change the data type of a column. EXPLAIN ANALYZE doesn't work with ALTER TABLE because there is no query plan generated, so I have no idea how quickly the statement actually executed. All I know is it completed in under a second.


You could try `time psql < alter-statement.sql`. I know, it'd not really be useful as it measures lots of overhead. But if it's fast on that, it's fast during an active session.


I could have turned on timing too (\timing in psql). All I know is it returned within one sec. Oh well, next time, I suppose.


Schemaless is awesome. Are you dba or a developer? If you're a developer like me schemaless is awesome because of it's flexibility. I focus less time on the how to do stuff and more time on the what stuff should we do.

I've been using Hibernate for 9 years and I finally came to the conclusion that it's just not worth the pain. When working on RDBMS I'm using straight SQL from now on.


Schemaless also dispenses with the ability to declare what correct data is in the schema. For critical apps that's a high caliber footgun. For critical apps that have to integrate with eachother, it's a nice piece of artillery aimed squarely at your foot.


> But when your application is in production you need migrations even with a "schema-less" db!

I disagree. The most frequent use-case I come across is adding columns / fields to a table / collection, and not needing to ALTER TABLE and run a database migration as part of the deployment process to add said fields is extremely awesome.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: