Re: keeping a backup ESP partition in sync

Lennart Poettering <lennart@xxxxxxxxxxxxxx> · Wed, 22 May 2024 15:08:33 +0200

On Fr, 17.05.24 11:03, Alexander Gordeev (alex@xxxxxxxxxxx) wrote:

> Hi,
>
> I've tried systemd-boot recently, I like it a lot. Thanks!
> There is still one concern. I'd like to have a backup EFI partition
> because you know things can happen and my rootfs is on a mirror
> anyway. There is a popular approach with setting up a mdraid version
> 1.0 to sync the ESPs. I don't like it because (1) FAT32 is not super
> reliable and (2) if there is a power outage when a partial state is
> written, then issues can happen, I think.

Yeah, RAID on ESP is a *bad* idea if implemented by the OS. UEFI has
write access to the ESP, and this is *actively* used by both firmware
stuff and by sd-boot/sd-stub to maintain try counters, random seeds
and so on. Thus, whenever you boot the fs is written to, and that
hence means on every single boot our RAID array will come up dirty.

If you have some firmware that does RAID natively you could probably
do ESP-on-RAID, but without it it's a receipe for desaster, not a
recipe for robustness.

> I think it is better to have them mounted as e.g. /boot/efi and
> /boot/eficopy and make changes like this:
> 1. update /boot/efi
> 2. make sure the update is actually written to the device
> 3. update /boot/eficopy
>
> Right now I do this manually with rsync. I'm thinking about adding
> kernel/initramfs/dpkg hooks. Maybe there are easier ways to do it?
> Otherwise maybe this feature is desirable in systemd-boot?

I don't see why systemd-boot would care about multiple disks – however
I do agree that for systems with many disks it might make sense to
teach *bootctl* some limited support for an ESP that exists in
multiple copies on multiple devices.

Hence, if somebody sends a patch that teaches "bootctl install" and
"bootctl update" and the others to deal with multiple ESPs then I
guess I'd be on board with that.

That said, the intended semantics for that are not clear to me at
all. i.e. there are some options:

1. mount the current ("primary") ESP to /efi/, and operate exclusively
   on that, except that at the very end after syncing the ESP is dd'ed
   on the block level onto a set of matching partitions other HDDs
   without any consideration of their current contents.

2. mount the current ("primary") ESP to /efi/, and expect that
   "secondary" ESPs are mounted to /efi.mirror/$DEVNAME/ or so, and
   then first operate on the primary ESP, and then only sync a very
   specific subset of dirs from the primary to the secondary ESPs,
   i.e. /loader/, /efi/Linux and /efi/systemd. Syncing would be
   "dumb", i.e. stupidly copy over, and remove dentrys not existing in
   the source.

   This is far from trivial to implement, because how would we even
   decide what to mount to /efi.mirror/$DEVNAME/, how would we expect
   users to mark the set of partitions? probably would require some
   udev rule, but that creates messy problems around waiting for these
   mirrors on boot (because we do update the ESP automatically at
   boot, for updating the random seed automatically, and more). After
   all it should be OK if mirrors go missing, but that means we cannot
   really delay booting waiting for them anymore (because we cannot
   distinguish the case "device is just slow to pop up" from the case
   "device is dead").

2b. same as 2, but try to be "smart" with syncing, ie. look at file
    mtimes, and let the newer versions win. Probably doomed to fail,
    due to clock/timezone unreliability in early boot and in
    particular firmware writes.

3. some scheme where there's no primary nor secondary, but just an
   equal set of partitions. This is harder than it sounds, since it
   raises questions what to do if updating some partitions works but
   things fail on others: do we undo the first change again, or do we
   just continue? if we declared one of the copies as "primary" (as
   suggested above) this problem goes away somewhat, since it would
   mean we could have strict success rules on the "primary" copy, and
   lax rules on the "secondary" copies. This also would have the
   problem that 3rd party tools are generally not ready to deal with
   the fact that there's more than one equivalent esp.

Hence, approach 2 is probably the best, but the waiting issue is a
major headache. it would probably mean we store away the list of
primary+secondary ESPs we have seen so far in a file in the ESP (which
is then sync'ed to all). And then add "bootctl wait-secondary-esps" or
so as a new tool that waits for them to show up, with some time-out
applied. But, uh, this gets so involved so quickly. (as you then
probably also need "bootctl add-secondary-esp" and "bootctl
remove-secondary-esp")

But anyway, if this matters to you, feel free to send a patch for
this, but it's not really job for a day or two, it's much more
involved than one might think.

Lennart

--
Lennart Poettering, Berlin