OLAP engine working on top of HBase

mwexler · on Jan 8, 2011

http://code.google.com/p/olap4cloud/wiki/UserGuide mentions their comparison to Hive... I wonder just how tuned their Hive is to take so much longer than this layer, even without the pre-aggregation. Would row order storage make that much of a difference? Doesn't Hive now have some type of indexing?

xal · on Jan 8, 2011

Afaik hive has no indexing at all. That's part of why it's so simple (well, for hadoop folks) to use. All you need to do is copy a bunch of csv, tsv, logs, whatever in a HDFS path, tell hive some basic infos about the files and you can join the table with any other information in the system. Really powerful stuff.

However, if you don't have any legacy data and/or don't mind manual import into HBase then something like the above solution may reduce the complexity of the issue enough to get much better query performance.

jhammerb · on Jan 8, 2011

There's been some work to add indexing to Hive. See http://www.slideshare.net/NikhilDeshpande/indexed-hive and https://issues.apache.org/jira/browse/HIVE-1803, for example.