Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Don't forget to stop writing to the database while doing the backup, otherwise you can run into an infinite loop if you write faster than sqlite3 .backup is doing the backup :D

Learned that the hard way when implementing sqlite3 backups on Gladys Assistant ( open-source home automation platform https://github.com/GladysAssistant/Gladys )



Can you elaborate more? I think the cron-solution will be unable to synchronize with your application code to determine when to stop writing, so more background and your solution would be of interest.

As I understand it, while you do the backup, other writes should go to the WAL log and only get commited until after the backup?


I did some reading, and there are 2 ways to use the SQLite backup API:

1. call backup_init, backup_step with a step size of -1, then backup_finish. This will lock the db the whole time the backup is taking place and backup the entire db.

2. call backup_init, backup_step in a loop until it returns SQLITE_DONE with a positive step size indicating how many pages to copy, then backup_finish.

With method 2, no db lock is held between backup_step calls. If a write occurs between backup_step calls, the backup API automagically detects this and restarts. I don't know if it looks at the commit count and restarts the backup from the beginning or is smart enough to know the first changed page and restarts from there. Because the lock is released, a continuous stream of writes could prevent the backup from completing.

I looked in the sqlite3 shell command source, and it uses method 2. So if using the .backup command with continuous concurrent writes, you have to take a read lock on the db before .backup to ensure it finishes. It would be nice if the .backup command took a -step option. That would enable the -1 step size feature of method 1. The sqlite3 shell uses a step size of 100.

Another option would be to check backup_remaining() and backup_pagecount() after each step, and if the backup isn't making progress, increase the step size. Once the step size is equal to backup_pagecount() it will succeed, though it may have to lock out concurrent writes for a long time on a large db. There's really no other choice unless you get into managing db logs.


It's not what I've experienced!

In my experience, as soon as there is some new data coming in the DB, the .backup command will continue, and if the writes are not stopping, the backup will never stop as well :D

In Gladys case, we put in the application logic a blocking transaction to lock writes during the backup. I haven't found any other way to avoid infinite backups in case of write-heavy databases


I'm using VACUUM INTO, which does basically that: https://sqlite.org/lang_vacuum.html#vacuuminto

> The VACUUM command with an INTO clause is an alternative to the backup API for generating backup copies of a live database....The VACUUM INTO command is transactional in the sense that the generated output database is a consistent snapshot of the original database.

EDIT: Litestream docs will also recommend that: https://github.com/benbjohnson/litestream.io/issues/56


I ran into the case of needing to back up a write-heavy database without blocking anything, and came up with a solution: Writing a VFS ( https://www.sqlite.org/vfs.html ) that makes the application move the Sqlite journals file to a directory for processing instead of deleting them. Another process reads them to see what pages were touched and can very quickly get the changes to update the backup.


to be fair, this sounds like a situation where Litestream is the more appropriate solution!


This surprises me. I was under the impression that SQLite backups run inside a dedicated transaction in order to avoid this.


See the SQLite online backup API documentation [1].

You can finish the backup in one step, but a read-lock would be held during the entire duration, preventing writes. If you do the backup several pages at a time, then

> If another thread or process writes to the source database while this function is sleeping, then SQLite detects this and usually restarts the backup process when sqlite3_backup_step() is next called. ...

> Whether or not the backup process is restarted as a result of writes to the source database mid-backup, the user can be sure that when the backup operation is completed the backup database contains a consistent and up-to-date snapshot of the original. However: ...

> If the backup process is restarted frequently enough it may never run to completion and the backupDb() function may never return.

The CLI .backup command does non-blocking backup IIRC so is subject to restarts.

[1] https://www.sqlite.org/backup.html


This comment cleared that up for me: VACUUM INTO backups are transactional, but the .backup mechanism is not: https://news.ycombinator.com/item?id=31387556

Ben I suggest updating that cron backups documentation page to recommend VACUUM INTO instead!



Good catch. I went ahead and added an issue on the docs. https://github.com/benbjohnson/litestream.io/issues/56




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: