Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The HBase integration with Pig is pretty good (disclaimer: I wrote a bunch of it, and use it on a daily basis). The only thing is that you need to create the table and set up column families yourself. The mongo driver Russel demoes automatically creates a table which may or may not be a good thing. Also, he didn't actually say anything about scalability except for linkbaiting in his title :).


HBase integration is good. But having to deal with column families, etc. rule it out for me in terms of solving the usability problem. I just want to push records and retrieve them as JSON. This is the most common use case when publishing data from Hadoop to a NoSQL store. I think this could be fixed? Can column families be inferred? I am highlighting Mongo's superior usability here to set an example for others.


I would argue that any time you put "just" and "terabytes" next to each other, you are heading for big problems to go with your big insights :). Schema-less is great.. until you can't find stuff and your data is full of inconsistencies.


I've operated this way in practice, at scale, and it works fine. You're rebuilding your entire store and swapping it out frequently, so data consistency isn't a problem. The key is to have a painless pipeline setup, so that one person can do the entire thing... thus negating the need for contracts between parts of the stack.


You might want to take a look at Lily; a document store that runs on HBase: [http://www.lilyproject.org/lily/index.html]. It exposes a REST interface to perform CRUD and search operations, and has a very expressive object model (including record-to-record links).


Lily looks cool, thanks.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: