On Thu, Nov 13, 2014 at 02:52:02PM +0100, Mathijs Kwik wrote: > Hi all, > > Today, I lost most my data (don't worry, got backups) after the cache > got corrupted somehow. I suspected a recent suspend-to-disk to be the > cause. I checked how my distribution (NixOS) handles suspend/resume and > I have some concerns about how bcache fits into this. Augh :( > Normally, the kernel and initrd get loaded. The initrd loads required > kernel modules, waits for udev to settle, activates luks&lvm, then > finally asks the kernel to resume from the resume device. > > The kernel documentation on suspend is VERY clear you should NOT touch > anything on disk between suspend and resume. So activating luks and LVM > is probably risky already, but it apppears both luks and LVM do not make > any on-disk changes when activated and any in-memory state (within the > resumed image) is still valid. The benefit of activating luks and LVM > before resume seems to be that it allows resuming from encrypted/lvm > volumes. Yeah, this is handled for in kernel stuff with the freezing mechanism, which bcache uses. > Now, with bcache added, things probably get a bit hairy. NixOS supports > bcache inside the initrd and uses udev rules to activate/attach. I > suspect this is probably unsafe. Probably bcache starts to see if any > dirty pages exist, to write them to the backing store. Even without > writeback caching, the activation of lvm will read some sectors, which > might trigger the cache to update. Then after resuming the image, the > in-memory state is corrupted and further damage occurs. > > - Does this sound plausible? > - Is there any way to tell bcache to make absolutely no changes to > either the backing device or the cache? > Basically like a readaround+writearound which can be triggered on > hibernate and switched off on resume. So, userspace shouldn't have to do anything to tell bcache about hibernation. The dev branch is getting a true read only mode (still in progress), but this isn't relevant to hibernation. bcache kernel threads (allocator thread, gc thread) should be correct w.r.t. hibernation, but - maybe the workqueue usage isn't. I'm probably not going to be able to get to this in the next couple days, but this is a pretty serious issue. Can you ping me again every couple days until I get a fix out for this, and myabe file a bug somewhere? (i think bugzilla.kernel.org has been used for bcache bugs before...) -- To unsubscribe from this list: send the line "unsubscribe linux-bcache" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html