Hi Lennart, Thanks for the detailed feedback, On Wed, May 22, 2024 at 3:08 PM Lennart Poettering <lennart@xxxxxxxxxxxxxx> wrote: > > On Fr, 17.05.24 11:03, Alexander Gordeev (alex@xxxxxxxxxxx) wrote: > > > Hi, > > > > I've tried systemd-boot recently, I like it a lot. Thanks! > > There is still one concern. I'd like to have a backup EFI partition > > because you know things can happen and my rootfs is on a mirror > > anyway. There is a popular approach with setting up a mdraid version > > 1.0 to sync the ESPs. I don't like it because (1) FAT32 is not super > > reliable and (2) if there is a power outage when a partial state is > > written, then issues can happen, I think. > > Yeah, RAID on ESP is a *bad* idea if implemented by the OS. UEFI has > write access to the ESP, and this is *actively* used by both firmware > stuff and by sd-boot/sd-stub to maintain try counters, random seeds > and so on. Thus, whenever you boot the fs is written to, and that > hence means on every single boot our RAID array will come up dirty. > > If you have some firmware that does RAID natively you could probably > do ESP-on-RAID, but without it it's a receipe for desaster, not a > recipe for robustness. Ok, I see, wow. It's a pity that the mdraid approach seems to still be popular. > > I think it is better to have them mounted as e.g. /boot/efi and > > /boot/eficopy and make changes like this: > > 1. update /boot/efi > > 2. make sure the update is actually written to the device > > 3. update /boot/eficopy > > > > Right now I do this manually with rsync. I'm thinking about adding > > kernel/initramfs/dpkg hooks. Maybe there are easier ways to do it? > > Otherwise maybe this feature is desirable in systemd-boot? > > I don't see why systemd-boot would care about multiple disks – however > I do agree that for systems with many disks it might make sense to > teach *bootctl* some limited support for an ESP that exists in > multiple copies on multiple devices. > > Hence, if somebody sends a patch that teaches "bootctl install" and > "bootctl update" and the others to deal with multiple ESPs then I > guess I'd be on board with that. Great, this is indeed the thing that I wanted to do. > That said, the intended semantics for that are not clear to me at > all. i.e. there are some options: > > 1. mount the current ("primary") ESP to /efi/, and operate exclusively > on that, except that at the very end after syncing the ESP is dd'ed > on the block level onto a set of matching partitions other HDDs > without any consideration of their current contents. Well, this means that the FAT filesystem IDs are going to be equal. This can be quite confusing, I think, since at the moment these IDs are the primary method to distinguish the filesystems when mounting them, right? > 2. mount the current ("primary") ESP to /efi/, and expect that > "secondary" ESPs are mounted to /efi.mirror/$DEVNAME/ or so, and > then first operate on the primary ESP, and then only sync a very > specific subset of dirs from the primary to the secondary ESPs, > i.e. /loader/, /efi/Linux and /efi/systemd. Syncing would be > "dumb", i.e. stupidly copy over, and remove dentrys not existing in > the source. > > This is far from trivial to implement, because how would we even > decide what to mount to /efi.mirror/$DEVNAME/, how would we expect > users to mark the set of partitions? probably would require some > udev rule, but that creates messy problems around waiting for these > mirrors on boot (because we do update the ESP automatically at > boot, for updating the random seed automatically, and more). After > all it should be OK if mirrors go missing, but that means we cannot > really delay booting waiting for them anymore (because we cannot > distinguish the case "device is just slow to pop up" from the case > "device is dead"). > > 2b. same as 2, but try to be "smart" with syncing, ie. look at file > mtimes, and let the newer versions win. Probably doomed to fail, > due to clock/timezone unreliability in early boot and in > particular firmware writes. > > 3. some scheme where there's no primary nor secondary, but just an > equal set of partitions. This is harder than it sounds, since it > raises questions what to do if updating some partitions works but > things fail on others: do we undo the first change again, or do we > just continue? if we declared one of the copies as "primary" (as > suggested above) this problem goes away somewhat, since it would > mean we could have strict success rules on the "primary" copy, and > lax rules on the "secondary" copies. This also would have the > problem that 3rd party tools are generally not ready to deal with > the fact that there's more than one equivalent esp. > > Hence, approach 2 is probably the best, but the waiting issue is a > major headache. it would probably mean we store away the list of > primary+secondary ESPs we have seen so far in a file in the ESP (which > is then sync'ed to all). And then add "bootctl wait-secondary-esps" or > so as a new tool that waits for them to show up, with some time-out > applied. But, uh, this gets so involved so quickly. (as you then > probably also need "bootctl add-secondary-esp" and "bootctl > remove-secondary-esp") > > But anyway, if this matters to you, feel free to send a patch for > this, but it's not really job for a day or two, it's much more > involved than one might think. Well, my initial idea was to add a file e.g. /etc/systemd/bootparts.conf listing the UUIDs or even mountpoints of the filesystems. The 'bootctl install' and 'bootctl update' could go through the list and repeat exactly the same steps, when called from package/initramfs/kernel hooks. Does the config file have to be kept on the ESP? Probably for some dual boot scenarios? And so you say, that the secondary ESPs will become not bootable after the next boot because of the writes done only in the primary ESP by firmware/sd-boot/sd-stub, right? If so, maybe this is indeed going to be very fragile... Yes, I thought, this feature would be easy to do. I'm not sure I'd have time to dive really deep into all this and do the complete thing right away... Maybe doable with some guidance. :) Kind regards, Alexander Gordeev