Isn't the typical big data sql task IO bound? Vectorization only works when you ...

jandrewrogers · on April 27, 2022

In a well-designed system, you will typically be limited by effective bandwidth, often memory bandwidth or efficient use thereof which is an area where vectorization can help. Modern servers have tremendous storage bandwidth if you have an I/O scheduler capable of using it. Some newer database engines explicitly reject the assumption that storage throughput is precious as a design constraint, since it has become much less true over time due to advances in hardware.

Use of page layouts highly-optimized for vectorized evaluation is common now even if the implementation isn't vectorized. You lose nothing on modern hardware (they are good layouts regardless) and it allows you to easily do vector optimizations later. As a semantic distinction, columnar and vector layouts are organized differently and optimize for somewhat different things even though they have superficially similar appearance. Classic DSM-style columnar is largely obsolete.

Vectorization, first and foremost, is about optimizing selection operations in a database, but it can provide assists in other areas like joins, sorts, and aggregates. Most queries are a composed from these primitives, so many parts of the query plan may benefit. As a heuristic, operations that GPU databases excel at are the same kinds of operations that benefit from vectorization.

Obviously you can't just throw vectorization at an arbitrary database and expect major benefits, they need to be intentionally designed for it.

alterneesh · on April 29, 2022

I can't seem to understand why vectorization wouldn't help, say if you read after a sort. Irrespective of whether it fits in memory, or you perform some sort of an external sort, any operation that you want to perform on top of that sorted vector, be it an aggregation to reduce it, or an arithmetic operation with another column, you could still leverage vectorization and would end up using fewer CPU cycles, no?