The HBase integration with Pig is pretty good (disclaimer: I wrote a bunch of it...

rjurney · on Dec 22, 2011

HBase integration is good. But having to deal with column families, etc. rule it out for me in terms of solving the usability problem. I just want to push records and retrieve them as JSON. This is the most common use case when publishing data from Hadoop to a NoSQL store. I think this could be fixed? Can column families be inferred? I am highlighting Mongo's superior usability here to set an example for others.

squarecog · on Dec 22, 2011

I would argue that any time you put "just" and "terabytes" next to each other, you are heading for big problems to go with your big insights :). Schema-less is great.. until you can't find stuff and your data is full of inconsistencies.

rjurney · on Dec 22, 2011

I've operated this way in practice, at scale, and it works fine. You're rebuilding your entire store and swapping it out frequently, so data consistency isn't a problem. The key is to have a painless pipeline setup, so that one person can do the entire thing... thus negating the need for contracts between parts of the stack.

rayglover · on Dec 22, 2011

You might want to take a look at Lily; a document store that runs on HBase: [http://www.lilyproject.org/lily/index.html]. It exposes a REST interface to perform CRUD and search operations, and has a very expressive object model (including record-to-record links).

rjurney · on Dec 22, 2011

Lily looks cool, thanks.