Your replies so far are excellent. You're pointing out things I've overlooked, t...

einhverfr · on Nov 7, 2011

> I don't follow your argument here. Could you restate it?

Sure.

Quick note, will use PostgreSQL SQL for this post.

Ok, take a simple example regarding US street addresses.

A street address contains the following important portions:

1) Street address designation (may or may not start with a digit). We will call this 'address' for relational purposes. 2) City 3) State 4) Zipcode

As for data value dependencies:

zipcode is functionally dependent on (city, state), and so for normalization purposes we might create two relations, assuming this is all the data we ever intend to store (which of course is always a bad assumption):

create table zipcode(zipcode varchar(10) not null primary key, city text not null, state text not null, id serial not null unique);

create table street( id serial not null, address text, zipcode_id int references zipcode(id), primary key(address, zipcode_id));

So far this works fine. However, suppose I need to place an additional constraint on (address) for some subset of (zipcodes), let's say all those in New York City. I can't do it declaratively, because all data constraints must be internal to a relation.

So at that point I have two options:

1) You can write a function which determines whether a zipcode_id matches the constraint and check on that, or

2) You can denormalize your schema and add the constraint declaratively.

I did some searching and determined strangely that although subqueries in check constraints are part of SQL92, the only "database" that seems to support them is MS Access. But while there are obvious issues regarding performance, I don't see why these couldn't be solved using indexes the same way foreign keys are typically addressed.

> Poking around the Oracle documentation and Ask Tom articles, it seems to be more art than science; mostly based on creating compound indices over the relevant fields. Oracle is smart enough to use an index if it's there for a recursive field, but will struggle unless there's a compound index for other fields. I don't see an obvious way to create what you might call 'recursive indices', short of having an MV.

No, there is an inherent problem here. Your index depends on other data in the database to be accurate. You can create an index over parent, etc. but you still end up having to check the hierarchy all the way down to find the path. You can't just index the path.

Consider this:

CREATE TABLE treetest (id int, parent int references treetest(id));

INSERT INTO treetest (id, parent) values (1, null), (2, 1), (3, 1), (4, 1), (5, 2), (6, 2), (7, 6);

The path to 7 is: 1,2,6,7. To find this, I have to hit 4 records in a recursive query. That means 4 scans.

So suppose we index this value, reducing this to one scan.

Then suppose we: update treetest set parent = 3 where id = 6;

and now our index doesn't match the actual path anymore.

With specialized hierachical databases, you could keep such paths indexed and make sure they are updated when any node in the path changes. There isn't a good way to do this in relational systems though because it is outside the concept of a relational index.

> My contention is that we, as a profession of people who Like Cool Things, tend to discount the value of ACID early and then painfully rediscover its value later on. The business value of ACID is not revealable in a benchmark, so nobody writes breathless blog posts where DrongoDB is 10,000x more atomic than MetaspasmCache.

No doubt about that. I think we are 100% in agreement there!

I'd also add that while RDBMS's aren't really optimal as backings for something like LDAP for a big directory, and while RDBMS's are horribly abused by dev's who don't understand them (ORM's and the like), they really are amazing, valuable tools, which are rarely valued enough or used to their fullest.

Later this week, I expect to write a bit of a blog post on http://ledgersmbdev.blogspot.com on why the intelligent database model (for RDBMS's) is usually the right model for the development of many business applications.

einhverfr · on Nov 7, 2011

In response to PostgreSQL's custom index types, taking a quick look at the API, I don't see a way of telling GiST indexes which entries need to be updated when a row's parent id is changed.

Consequently I don't believe there is a reasonable way to index this because there is no way to ensure the indexes are current and so you don't have a good way of testing that a row is in a path on the tree other than building the tree with recursive subqueries.

The thing is, unless you have a system which is aware of hierarchical relationships between the rows (which by definition is outside the relational model), you have no way of handling this gracefully. So here you have lots of reads, I really think dedicated hierarchical systems will win for hierarchical data.

Of course this wouldn't necessarily mean you couldn't store everything in the RDBMS and periodically export it to the hierarchical store.....

jacques_chester · on Nov 7, 2011

Informative replies, thank you.

> 1) You can write a function which determines whether a zipcode_id matches the constraint and check on that, or

Ah, this old chestnut. Been there, written the PL/SQL trigger, got the t-shirt. Agreed that there isn't a purely declarative approach here.

> You can't just index the path.

Postgres might be the winner here, if someone sufficiently motivated came along and wrote a custom index type for this use case.