Re: bcache and hibernation

Mathijs Kwik <mathijs@xxxxxxxxxxxxxxxx> · Mon, 01 Dec 2014 09:48:28 +0100

Kent Overstreet <kmo@xxxxxxxxxxxxx> writes:
>
> BTW - it sounds like you're ahead of me on how this is put together - could you
> point me at the userspace side of hibernate that you're using (those initramfs
> scripts, and in particular whatever device mapper does)? that'll help a lot.

Please see
https://github.com/NixOS/nixpkgs/blob/master/nixos/modules/system/boot/stage-1-init.sh 
lines 128 onward. I'm just using the in-kernel suspend/hibernate
functionality (swsusp) but the same probably applies to tuxonice and
other solutions as well.

As far as I understand the events leading up to hibernation are very
similar to suspend. The kernel will notify processes and kernel threads
they will be frozen. Then, when everything has prepared for suspension,
instead of just putting the system to suspend, the kernel will write the
full contents of the system RAM to a swap device. 

I'm pretty sure that's all still OK and things are in a consistent
state. However, when resuming the system, some basic initialization is
performed in initrd. At least SCSI/SATA controller modules and other
stuff needed to find the hibernated RAM image are needed of course, but
most distributions will include more stuff and use udev to find hardware
and load the appropriate modules.

This is where things might get nasty though. On nixos, we normally
initialize bcache using udev rules. Modern versions of udev will do a
quick scan of block devices when found to find their labels/types. So
while waiting for stuff like disks/usb/whatever to appear, my bcache
partitions get found and activated. I'm pretty sure bcache will take
over from here and do some bookkeeping / flush dirty buckets, whatever.
Even without writeback, things might change on-disk: udev and tools like
vgscan (lvm) and "btrfs scan" might probe some magic sectors inside the
newly-activated bcache device. If those aren't in the cache, they will
be put there, once again changing the on-disk state.

Then finally (line 190) the kernel gets instructed to check if the swap
device contains a hibernated RAM image and restore that. For everything
running "inside the RAM image", it's just like waking up from a normal
suspend.

>From this explanation, it should be clear that it is vital that no
on-disk state is changed in the brief period that the initrd is setting
up the system, or bcache's in-memory state (inside the resumed RAM
image) will be corrupted, probably leading to disasters. Either that, or
bcache should assume nothing on resume and make sure to reassemble its
entire in-memory state from disk.

The temporary solution I found was to not include the udev rules in the
initrd so it will not get found and activated before resume. Then for
normal booting (I have my root on bcache) I manually load and activate
bcache _after_ seeing there is no resume image. However, this solution
is ugly, because I need to repeat all other initialization steps
(lvm/btrfs) afterwards. 

Hope this helps,
Mathijs
--
To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html