Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Show HN: sqlite-memory-vfs - Open a SQLite db in Python without hitting disk (github.com/michalc)
17 points by michalc on Dec 27, 2023 | hide | past | favorite | 14 comments


Recent SQLite builds include the memdb [1] VFS (and the associated sqlite3_serialize and sqlite3_deserialize APIs) by default. How does this compare?

I'm also curious what use cases you had in mind. My SQLite wrapper reimplements the memdb [2] VFS mostly as an exercise of the VFS API, but it provides a few additional niceties. The backing memory doesn't need to be contiguous, nor is it copied when the DB grows/shrinks. This makes it possible to have significantly larger in memory DBs. I also make it easy to bootstrap the database from (e.g.) a file on disk, or data embed in the binary. It's not as easy, however, to access the DB bytes after the DB goes “live.”

[1] https://www3.sqlite.org/src/file?name=src/memdb.c

[2] https://pkg.go.dev/github.com/ncruces/go-sqlite3/vfs/memdb#s...


From a cursory look at your source code…

It looks like memory is also not contiguous in your implementation.

You don't support concurrency at all (xLock/xUnlock/xCheckReservedLock are noops).

I'm not sure how APSW handles journaling (you usually want the journal_mode to be memory in these cases, but to force that, you must return SQLITE_OPEN_MEMORY as an OutFlag from xOpen). Otherwise, you need to deal with files (like journals, etc) for which reads/writes are not page aligned.

There is one caveat, though: if you want to “deserialize” a WAL database with journal_mode memory, you must convert it to rollback before opening (but you can do that by editing bytes 18 and 19).


> and the associated sqlite3_serialize and sqlite3_deserialize APIs

So... I have to admit I missed the existence of these. So for many cases, using sqlite3_deserialize is probably a very reasonable choice over sqlite-memory-vfs

Still, as you say, sqlite-memory-vfs doesn't require the memory to be contiguous, unlike sqlite3_deserialize

> You don't support concurrency at all (xLock/xUnlock/xCheckReservedLock are noops).

Well, you can read concurrently, just not write


If you're interested, both SQLite's and my memdb VFSes implement safe locking.

Depending on your familiarity with Go, mine maybe easier to follow, or not.

https://github.com/ncruces/go-sqlite3/blob/f1b00a9944730eaa9...


Ah thanks for this (maybe not so much for the code itself admittedly, but for the push to implement locking)

So after a few attempts, I _think_ I have something reasonable: https://github.com/michalc/sqlite-memory-vfs/blob/64a55c3491...

(I'm still a but stuck on PENDING I have to admit... not really sure how that fits in)


Pending exists to prevent writer starvation: if there's always at least one reader, no one can ever write.

You take pending before exclusive (which should always be possible because you have reserved by then), and you should check pending before taking a shared lock (no need to check exclusive, because they got pending before exclusive).

Shared forces writers to wait; Reserved prevents new readers from upgrading to write; Pending prevents new readers from starting; and exclusive forces everyone else to wait.


Have now implemented a version of this at https://github.com/michalc/sqlite-memory-vfs/blob/74a24ce567...

I think different to yours in two ways:

- If doesn't go via PENDING if it doesn't need to. Not sure what's best really - but since RESERVED can already be skipped in some cases (recovering from a hot-journal) then so far happy with "skipping is a thing", and PENDING is private to the VFS anyway from what I can tell, so nothing else should care I think

- It doesn't do the blocking in the lock function, but depends on SQLite's core code doing it via the busy handler/timeout


Do you have benchmarks and how does this differ from pointing SQLite to /dev/shm ?


No benchmarks right now. And in fact, I'm not even really sure what would make good ones, so suggestions welcome.

In terms of the difference from pointing SQLite to /dev/shm, my answer is somewhere between "I don't know", and "I suspect the VFS is more portable because not all systems have /dev/shm?"


It is funny people are doing SQL everything after the NoSQL hype. But why?


Because SQL works really well for a lot of what people need a database for?


import sqlite3

conn = sqlite3.connect('%MEMORY%')

Is there more to it than that?


Second sentence in the linked GitHub:

While SQLite supports the special filename :memory: that allows the creation of databases in memory, there is no built-in way to populate such a database using raw bytes without hitting disk as an intermediate step. This virtual filesystem overcomes that limitation.


I'm surprised writing to %MEMORY% hits the disk. Anyway ramdisks like tmpfs are a thing.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: