Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
How ClickHouse saved our data (2020) (mux.com)
128 points by mmcclure on Feb 18, 2021 | hide | past | favorite | 52 comments


I’ve also had some pretty clear successes moving a ~2TB Postgres database into a 70GB Clickhouse instance, (though, in this case a good 2/3 of the issue was scheme design on the PG side, but there were still another order of magnitude compression to be had from clickhouse.)

Queries went from a 12 hour table scan to ~2 minutes, and production queries from 45 sec to 400ms.

This is without doing anything particularly tricky on the clickhouse side, nor even putting it on a particularly large machine. When benchmarking it, I put it on a 10yr old laptop, and I was seeing production queries in the seconds range,


I had as well excellent results when I experimented with Clickhouse (in my case the table with the most rows that I ever created had 6 columns using a total of ~55 bytes per row - after having inserted ~35 billion rows the table was 450MiB big). My usecases involving big aggregations and mini-searches (when searching/retrieving just few rows) were excellent (big aggr) respectively good (small searches - not as fast as a normal DB, but for me for sure acceptable).

For whoever is interested to try it out, be aware that you really should understand how "parts" and "merges" work and your write-workload should take that into consideration (for example dynamically lower the workload when "merges" are running, take into consideration the I/O needed to delete pending old "parts", etc...) - if you don't then you're very likely to get bad performance if your workload is write-heavy. The amount of columns and the sort order is as well extremely important.

Last year I decided to start using Clickhouse to host Collecd's metrics ( https://www.segfault.digital/how-to-directly-use-the-clickho... ) and so far it's working well. I upgraded twice the version of the CH-software, and killed once by mistake its VM but afterwards I never had problems ("so far", hehe).


Yeah, DoubleDelta + compression is amazing.

I have one column that's an incrementing integer per partition, and for a million rows the total storage for that field is ~20k. That's a savings of something like 10% over the whole dataset.


> That's a savings of something like 10% over the whole dataset.

Sorry, I'm probably misunderstanding you - I'm not sure about that "10%", what does it it refer to?

Anyway, I do like a lot in Clickhouse to be able to chain codecs (I personally like to think about "Delta/DoubleDelta/Gorilla/T64 codecs" being "encodings" and the "general purpose codecs LZ4/LZ4HC/ZSTD" being "compression codecs").

I don't have a good math background => about Delta & DoubleDelta I liked this explanation ( https://altinity.com/blog/2019/7/new-encodings-to-improve-cl... ) which defines "delta" as tracking "distance" and "doubledelta" tracking "acceleration" between consecutive column values.

In the end I tested, using relatively small datasets (which were different between usecases), all combinations of encodings (delta/doubledelta/gorilla/t64) and ZSTD-compression levels (mostly 1/3/9) (I ignored LZ4*).

- "Delta"&"DoubleDelta" were often interesting (but in general for my data using "Delta"+ZSTD was already good enough compared to the rest).

- "Gorilla" somehow never gave me any benefits if compared to other codecs and/or compression algos.

- "T64" is a bit a mistery for me, anyway in some tests it delivered excellent results compared to the other combinations, therefore I'm currently using just T64 for some columns, and for some other columns as T64+ZSTD(9).

EDIT: sorry, I think I got it - you probably meant something like "just by doing that on that specific column, the overall storage needs were reduced by 10% for the whole dataset", right?


Yep. That's it.


The system tables are incredibly helpful for optimizing ClickHouse performance. The system.columns table gives exact compression ratios for every column and system.parts shows similar stats for table parts. It's possible to tune schema very effectively without a lot of deep knowledge of how codecs and table partitioning work internally.


> ~2TB Postgres database into a 70GB Clickhouse

was your PG DB been compressed?


In addition to what OP mentions in the sibling comment, ClickHouse supports a whole bunch of encodings that can take advantage of column based layouts to further compress the columns in very clever ways


Parts may have been , but it was ~2B rows at ~ 1K/row in PG, with a couple of embedded json fields (which were probably compressed, but ... that's coming from a bad starting point anyway).

It got converted to a table with ~25 Float32 columns + a couple of key columns.


Yes it's an 5-apples vs 1-orange comparison, especially the dynamic json fields where keys are repeated.


Fixing the PG schema would have saved me about 2/3 of the space. (from a small test run) Good, but not enough to move the needle.

Queries on the PG setup here are back to the bad old days of being limited by bulk disk performance.


> Good, but not enough to move the needle

You could reduce the gap by running PG on compressed filesystem, so it would be more apples to apples comparison.


It would be 1-apple to 1-orange comparison. If the columnar store has an "ok" architecture, it should provide 10x+ faster queries.


> Queries went from a 12 hour table scan to ~2 minutes, and production queries from 45 sec to 400ms.

Hmm, why do you have such large tables and then do full scans?

Edit: and if it's analytics why were you using a row store instead of a columnar database? It's like saying changing to a Ford Mustang was 10x faster than a rusty John Dere combine.


> and if it's analytics why were you using a row store instead of a columnar database?

This point confuses me, parent is telling us how they went from a row-based (pgsql) to a column-based DB (Clickhouse), and you asked them why they didn't?


There are many, many ways to avoid a full scan. Fixing a dumb query and proper indexing (partial indexes if this is expensive), materialized views, and several more. It's very common the developers either resist or are not aware of how to use a database. A full scan is dumb and will retrieve all data of this big table without any filtering or even order. This is highly suspicious.

And in the very, very rare case it is absolutely necessary to do a full scan, then Postgres might not be a good idea. Why is it being bashed? I read "Postgres was 12h slow" not "we picked the wrong tool and of course Postgres was slow". Don't blame the hammer.

IME, it's common for programmers to not be educated on what modern databases can do. This was made worse with all the NoSQL mania ending up with the MongoDB is web scale. We are still dealing with the fallout of that.


FWIW, This project was one that I came in at the deployment stage, and the original requirements were for something about 10x smaller. Budget Gone. Shit happens.

There were many places where the right choices weren't made for this, and could have been made better. But even with optimizations in the schema speeding things up by a factor of 3, we would have been looking at something that still didn't hit the performance needs. I'm guessing that to do the reorg on the full DB, we would be looking at O(week). Effective indexes are great, but when one optimized index is tens of GB and takes a day to create, you start to look elsewhere.


That story makes a lot of sense.

I took issue with your original comment as it was short on details and could be interpreted in a different light.


Why did I do a full scan? Because I wanted to see just how bad it was, and I wanted to see roughly what was in the database.

I ran: `select count (distinct foo)` and got ~9000 distinct items over 2B rows in 12 hours. Running the same query in Clickhouse was O(minute)

As a little extra comparison, the total size of the Clickhouse database is 1/2 the size of the indexes in Postgres.


Data is the size that it happens to be for many businesses. this could easily have been a large user view history table or similar online use case that folks happened to need analytic capabilities on.


In this case, it's the output from ~10k simulations, each over up to a million zones.

Production queries will select either one row by primary key, or sum metrics on up to a million rows over a set of the values.


> why were you using a row store instead of a columnar database?

According to this comment (https://news.ycombinator.com/item?id=26187645) they were using a columnar extension in Postgresql.


That's not me.


It would be interesting to learn whether other options were considered. I would have definitely tried Presto (or AWS Athena) querying ORC data, which would amount to a columnar storage with a sparse index, just like Click House.


I’ve used Presto and Athena and neither could really hold a candle to ClickHouse. Presto was handy, but I’d personally much rather go to the effort of getting things into a proper columnar database than put up with the quirks of the likes of Presto.

If you want, there’s a detailed analysis of various db’s performance on various tasks here:

https://tech.marksblogg.com/benchmarks.html

It’s one of the top performing ones, even the single machine setup comfortably beats the likes of Presto, Athena, etc


I have evaluated these different options. And by far I prefer Clickhouse when were talking about less than a petabyte of data. It super simple to install and setup. And is a lot faster.


I am currently building an internal analytics platform for web event data on top of CH, and I recommend it too.

I am migrating a lot of complex Scala queries to CH SQL and I am surprised at everything I can do, very deep and rich API backed by great performance. Also, Clickhouse is a bit more operationally involved than a solution like BigQuery, but at a fraction of the performance per dollar cost.


Does anyone know of a hosted Clickhouse service?

I'd like to use Clickhouse but on a very small team without an ops person, it feels like a liability managing a server ourselves.


Consider using Yandex Cloud, we offer managed Clickhouse as a service. Clickhouse is opensource DB created by Yandex.


Altinity provides one I think!


Altinity.Cloud runs on Amazon. Check out the Altinity website.

Disclaimer: I work for Altinity.


Are you guys hiring?


Interesting, but I’d be interested if they had compared against CitusDB with cstore_fdw columnar backend.


(Mux co-founder here) The Postgres system described as the predecessor to Clickhouse here actually was cstore_fdw with a pretty heavily customized (and outdated ) CitusDB. CitusDB is a great system for sharding and distributing postgres, but we found that the compression and query performance for this specific analytics use case was much better served by Clickhouse.


Cloudflare also moved from Citus to Clickhouse


The only place Citus is mentioned in the post is in the title of a chart. Might be worth updating to include this, would also clarify note #3:

"ClickHouse is still faster than the sharded Postgres setup at retrieving a single row, despite being column-oriented and using sparse indices. I'll describe how we optimized this query in a moment."


Not A-F1V3, but also a Mux co-founder.

To be transparent, we included Citus in original versions of the post, but decided to take it out because it didn't feel like it was a fair representation of Citus™. As Adam mentioned, what we were using was heavily customized and based on a pretty outdated fork by the time we transitioned.


This should be clarified in the post, otherwise you lead the reader to believe you're representing stock Postgres performance instead of the performance of a forked Citus. This is important information for an informed reader.


ClickHouse has to be one of my favourite databases and pieces of technology.

It’s stupid fast, I know it’s capable of workload sizes past where I’ve pushed it so far. I’ve used it in anger and even then I was able to solve the issues I had easily. The out-of-the-box integrations (like the Kafka one) have made my life so much easier. I get excited whenever I get to use the instance I manage at work, or I demo something about it to someone because I know I’ll often blow their minds a bit.


Controversial question: can you trust clickhouse? After the JetBrains hack I'm starting to think if I should question any software that's coming from east of Europe (even if the org is legit).


> After the JetBrains hack

AFAIK there was no evidence that solarwind fiasco was because of jetbrains [0], also the clickhouse is open source [1] under apache2 license and you can check the source and question it by yourself.

[0] - https://www.zdnet.com/article/jetbrains-denies-being-involve...

[1] - https://github.com/ClickHouse/ClickHouse


You mean the solarwinds hack?


What "JetBrains hack"?


They are conflating the solar winds hack with Jetbrains because solar winds used Team City as their build tool.


I don't understand the obsession with ClickHouse. While it seems like it fits this particular use case, it still deals with the same limitations and challenges of columnar DBs. Your queries will be very fast with counts/averages, but there's a tradeoff with other functions: inserts are efficient for bulk inserts only, your deletes and updates are slow, no secondary indexes...

While Clickhouse can be lightning fast, is it really designed to be a main backend database?


Clickhouse is not something you use for a CRUD backend.

The obsession with Clickhouse is the phenomenal performance for the OLAP use case, a scenario where there were not many open source, easy to install and maintain options. For the most part you can treat it as a “normal” database, insert data into it and query it without messing about with file format conversions and so on. The fact that it is blindingly fast is a big bonus!


> For the most part you can treat it as a “normal” database

While Clickhouse is great at what it does, I would expect a „normal“ database to support transactions. Don’t use it to handle bank accounts.


ClickHouse supports transactional (ACID) properties for INSERT queries. It can also replicate data across geographically distributed locations with automatic support for consistency, failover and recovery. Quorum writes are supported as well.

This allows to safely use ClickHouse for billing data.



I believe Clickhouse is competing with other analytical, OLAP databases, not something like vanilla Postgres or MariaDB or Oracle.

This is not your application backend database.


Secondary indexes are supported by ClickHouse.

https://clickhouse.tech/docs/en/engines/table-engines/merget...

They are not like indexes for point lookups, but also sparse. Actually they are the best you can do without blowing up the storage.


No...it's designed to be an OLAP database.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: