Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Ask HN: Database for efficiently storing many numpy arrays
4 points by malux85 on Nov 12, 2017 | hide | past | favorite | 2 comments
I'm looking for a database that can store dense N-dimensional numpy arrays. Query performance is not as important as storage efficiency, the arrays are dense but mostly 0's and 1's and compress well.

I tried pg's array support (very slow) and storing it in pg's JSON field (also quite slow, but more acceptable)

I should also add, we're going to be writing a lot more than reading -- potentially for many processes, so anything with ACID-like support would be great.

But there must be dedicated databases out there for dense numerical data like this?



pyarrow and parquet may fit your requirements. There are a number of analysis engines (spark, drill, hive, etc) that support parquet. For more detail see:

https://tech.blue-yonder.com/efficient-dataframe-storage-wit...

https://www.slideshare.net/julienledem/strata-ny-2017-parque...

If ACID guarantees are a must then use Kafka as a message broker between memory and file, but this has the cost of added complexity. For more info:

https://eng-staging.verizondigitalmedia.com/2017/04/28/Kafka...

http://activisiongamescience.github.io/2016/06/15/Kafka-Clie...


Not databases as you asked for, but, I stumbled on this data while looking for persisting numpy arrays. You might find it interesting: https://github.com/mverleg/array_storage_benchmark/blob/mast...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: