andytwigg's comments

andytwigg · on Dec 12, 2015

What is the advantage of an order preserving encryption scheme over choosing an arbitrary mapping for each data column and applying that to the data?

andytwigg · on Aug 10, 2011

I've read most of the messages on this list with interest. I've done some benchmarking on LevelDB with billions of entries, and on commodity flash SSDs, and thought that summarising it might be of interest to folks on this list:

http://www.acunu.com/blogs/andy-twigg/benchmarking-leveldb/

Comments most welcome.

-Andy

-- CTO, Acunu Inc. www.acunu.com | http://www.cs.ox.ac.uk/people/andy.twigg/

andytwigg · on May 9, 2011

Hi all, Acunu (http://www.acunu.com/jobs) will be here - if you're interested in algorithms and file systems or just a great hacker, come and find us! Say hello at [email protected]

andytwigg · on April 19, 2011

Slides from a recent talk http://www.acunu.com/2011/04/algorithms-at-cassandra-meetup/

andytwigg · on April 12, 2011

SSDs are typically great at random reads, but not so good at random writes, particularly when you start rewriting the device (see http://www.acunu.com/2011/04/log-file-systems-and-ssds-made-...). The attraction of append-only structures is in alleviating the random writes; but you still need to be careful. For example, CoW trees and log file systems often have a big space blowup, which translates into a slowdown in insert rate, and potentially random IO in the log cleaner/GC. That's a scenario in which the Stratified B-tree does a lot better.

andytwigg · on April 12, 2011

In general, I agree - as you'd expect, practical implementations of this structure will naturally deviate quite far from the paper's presentation, and indeed we do make many optimizations for SSDs and rotating disks. But, in theory, disk vs RAM is no different to RAM vs L2. A central trick is the 'stratification' procedure that operates on arrays - that could be helpful elsewhere.

andytwigg · on April 12, 2011

Thanks for that. The second paper you reference contains more details of the data structure. I have submitted an updated version of the second paper - it will probably appear on arxiv in 2 days. We have a much improved version of it, though, which I hope to post sometime next week.

For more information about append-only B-trees and SSDs, see http://www.acunu.com/2011/04/log-file-systems-and-ssds-made-...

crux_ · on April 12, 2011

I don't suppose there's a chance of seeing the O'Caml implementation released as well, is there? It's neat to see it being used for what (IMO) is a really underutilized sweet spot for the language.

(That & the fact that a code release is an invaluable road map for anyone looking to follow in your footsteps.)

andytwigg · on April 20, 2011

The OCaml implementation is what we use internally to test implementations of complicated new data structures, and as such, it's quite unreadable to anyone else! We will probably talk at CUFP about our experiences of OCaml. It definitely has some upsides, but is not without some fairly major downsides (concurrency, serialization, lack of consistently-named library functions, ...) . As such, we're rewriting our basic data structures in Scala - this seems like it keeps most of the benefits but without some of the major problems. Now we know how to implement the structures, we might be able to do a clean implementation which can be released!

leef · on April 12, 2011

Thanks. The paper mentioned an open-source project for this. Does that exist yet?

andytwigg · on April 20, 2011

The core kernel code will be released in the next few weeks, along with a community site, probably code.acunu.com. We hope to post an announcement when that happens.