Came here to post this, fantastic paper and probably the most comprehensive thin...

AdamProut · on April 27, 2022

For scan heavy apps columnstores will be much much faster. For simple CRUD apps that just want to read/write a few rows at a time (with high concurrency) they have a number of inefficiencies vs a rowstore.

  - Columnstore often don't support indexing at all (or have 
  weak support for it).  This means a scan is needed to find 
  any row (with min/max or segment elimination to avoid 
  opening up files with no matching rows at all).  Even if 
  this scan is very fast, its still going to use up more CPU 
  then an index seek.

  - Most columnstores don't use compression schemes that are 
  incremental.  To grab a single row the columnstore likely 
  decompresses many adjacent rows (could be millions of rows 
  - depends on the particular columnstore).

  - Most columnstores don't support fine grained locking 
  (row level locking).  They often lock the entire table or 
  an entire segment (millions?) of rows whenever a single 
  row is written to.  This damages concurrency.

  - The row reconstruction costs are high (gluing the 
  columns back together)

We have a SingleStoreDB paper in this years SIGMOD that goes into these implementation details (not publicly published just yet). The columnstore in SinglestoreDB is reasonably good at OLTP without sacrificing its OLAP performance (see some benchmarking results here: https://www.singlestore.com/blog/tpc-benchmarking-results/).

gavinray · on April 27, 2022

Also I just noticed that you used the TPC-C benchmark here

Have you considered re-benchmarking with TPC-E? It's the updated version of the OLTP test that more accurately represents these sorts of apps:

  > "In February 2007, the new TPC-E benchmark [7] became a TPC standard. It is designed to be a more realistic OLTP benchmark than TPC-C, e.g., incorporating realistic data skews and referential integrity constraints."


  > "We find that (i) TPC-E is more read intensive with a 9.7:1 I/O read to write ratio, while TPC-C sees a 1.9:1 read-to-write ratio; and (ii) although TPC-E uses pseudo-realistic data, TPC-E’s I/O access pattern is as random as TPC-C."

It's a difference between a 10/1 read/write ratio, and a 2/1 read/write ratio. I've never worked on a line-of-business/SaaS app with a ratio lower than 80% reads FWIW.

https://www.tpc.org/tpce/default5.asp

http://www.cs.cmu.edu/~chensm/papers/TPCE-sigmod-record10.pd...

AdamProut · on April 27, 2022

Yeah, TPC-E is a better (more advanced) OLTP benchmark. TPC-C is trivially scalable by sharding on warehouse id everywhere. The problem with TPC-E is not many companies publish (official or unofficial) results for it, so its not as useful when comparing systems, which is what we were after in our blog post.

gavinray · on April 27, 2022

  > The problem with TPC-E is not many companies publish (official or unofficial) results for it, so its not as useful when comparing systems, which is what we were after in our blog post.

Oh, yeah this makes a lot of sense

gavinray · on April 27, 2022

Really interesting, going to have a look at that paper -- thanks for sharing your insight

Let me ask this: if you have something like a GraphQL API, which often does sparse column selection from multiple different tables, would that also be a good fit for columnar database?

In most GraphQL queries, you're grabbing a portion of the fields from one or more tables, IE something like:

    query JoesCompletedTodos {
        users(where: { name: { _eq: "Joe" } }) {
            id
            name
            todos(where: { completed: { _eq: true } }) {
                text
            }
        }
    }

Where this will get translated to something along the lines of

    SELECT users.id, users.name, json_arrayagg(todos) FROM users
    INNER JOIN todos ON todos.user_id = users.id
    WHERE users.name = 'Joe' AND todos.completed = true
    GROUP BY users.id, users.name

Many times you'll see queries spanning 3-4 relation levels deep, plucking something like 2-6 columns from each table.

Curious how well a columnar DB would do with something like this?

Also, on this point:

  > "Most columnstores don't use compression schemes that are incremental.  To grab a single row the columnstore likely decompresses many adjacent rows (could be millions of rows - depends on the particular columnstore)."

I think this is something that could be avoided if the data were in IE, Arrow, and you used Arrow Flight/FlightSQL as the transport mechanism, right? But this isn't my area of expertise, for sure.

AdamProut · on April 27, 2022

Filters like this one:

    WHERE users.name = 'Joe' AND todos.completed = true

Have most of the problems I mentioned above unless "many" rows match the filter. Lack of ability to seek to specific rows in the data hurts.

RE: Arrow.. I haven't read too much about the compression scheme(s) it uses. If it only does dictionary compression, that is reasonable easy to make incremental, so its possible Arrow doesn't have that specific problem.

mamcx · on April 27, 2022

Let's assume we want: Where are that databases?

Making one like that (with a hybrid model like PAX) with other things I consider important to improve the situation (like algebraic type support, full relational, etc) are desirable, but FUNDING that is other thing. Make a database is major effort and is not easy to get the proper support for it.

But I killed to have the chance, and I think the market is ready for go beyond the current crop of NoSql/NewSql...