Hey, this is Aaron from GitHub. We're using devicemapper w/ LVM backed pools. Wo...

cyphar · on Aug 16, 2017

That's funny, we have an internal bug open right now about a kernel panics that happen with devicemapper (with XFS as the base filesystem). We found that the issue was exacerbated if you used loopback devices, but on paper it should still happen in non-loopback mode (the current theory is that it's a bug in XFS). Our kernel team is still investigating the issue, but they cannot seem to reproduce the issue with direct-lvm (and loop-lvm is inconsistent in reproducing the issue).

If you can consistently reproduce the issue, would you mind providing the backtrace and/or coredump? Is it possible for you to reproduce the issue on a machine without needing to be hit by GitHub-levels of traffic, and if so can you provide said reproducer?

For reference, our backtraces show that the kernel dies at Xfs_vm_writepage. Though of course different kernel versions may have varying backtraces.

You can reach me on the email in my profile, or asarai(at)suse.com.

rleigh · on Aug 16, 2017

My schroot tool used for building Debian packages could panic a kernel in under five minutes reliably, when it was rapidly creating and destroying LVM snapshots in parallel (24 parallel jobs, with lifetimes ranging from seconds to hours, median a minute or so).

This was due to udev races in part (it likes to open and poke around with LVs in response to a trigger on creation, which races with deletion if it's very quick). I've seen undeletable LVs and snapshots, oopses and full lockups of the kernel with no panic. This stuff appears not to have been stress tested.

I switched to Btrfs snapshots which were more reliable but the rapid snapshot churn would unbalance it to read only state in just 18 hours or so. Overlays worked but with caveats. We ended up going back to unpacking tarballs for reliability. Currently writing ZFS snapshot support; should have done it years ago instead of bothering with Btrfs.

sweettea · on Aug 17, 2017

In my work identity, we saw a similar problem in our testing, where blkid would cause undesired IO on fresh devices. Eventually, we disabled blkid scanning our device mapper devices upon state changes with a file /etc/udev/59-no-scanning-our-devices.rules containing: ENV{DM_NAME}=="ourdevice", OPTIONS:="nowatch"

Alternately, you could call 'udevadm settle' after device creation before doing anything else, which will let blkid get its desired IO done, I think.

rleigh · on Aug 17, 2017

Yes, we did something similar to disable the triggers. Unfortunately, while this resolved some issues such as being unable to delete LVs which were erroneously in use, it didn't resolve the oopses and kernel freezes which were presumably locking problems or similar inside the kernel.

bpineau · on Aug 16, 2017

A known (and now fixed) kernel issue affects the scheduler and cgroups subsystem, triggering crashs under kubernetes load (fixed by 754bd598be9bbc9 and 094f469172e00d). The fix was merged in Linux 4.7 (and backported to -stable, in 4.4.70). So if you run an older kernel, maybe you are hit by this?

ninkendo · on Aug 16, 2017

Any particular reason you didn't choose something like overlay2?

vkat · on Aug 17, 2017

Assuming they are using RedHat, they only recently announced overlay2 will be fully supported in newest release 7.4