Re: the need for a discoverable sub-volumes specification

Topi Miettinen <toiwoton@xxxxxxxxx> · Tue, 9 Nov 2021 19:48:43 +0200

On 8.11.2021 17.32, Lennart Poettering wrote:
Besides the GPT auto-discovery where versioning is implemented the way
I mentioned, there's also the sd-boot boot loader which does roughly
the same kind of OS versioning with the boot entries it discovers. So
right now, you can already chose whether:

1. you want to do OS versioning on the boot loader entry level: name
    your EFI binary fooos-0.1.efi (or fooos-0.1.conf, as defined by the
    boot loader spec) and similar and the boot loader automatically
    picks it up, makes sense of it and boots the newest version
    installed.

2. you want to do OS versioning on the GPT partition table level: name
    your partitions "fooos-0.1" and similar, with the right GPT type,
    and tools such as systemd-nspawn, systemd-dissect, portable
    services, RootImage= in service unit files all will be able to
    automatically pick the newest version of the OS among the ones in
    the image.

and now:

3. If we implement what I proprose above then you could do OS version
    on the file system level too.

(Or you could do a combination of the above, if you want — which is
highly desirable I think in case you want a universal image that can
boot on bare metal and in nspawn in a nice versioned way.)

Now, in sd-boot's versioning logic we implement an automatic boot
assesment logic on top of the OS versioning: if you add a "+x-y"
string into the boot entry name we use it as x=tries-left and
y=tries-done counters. i.e. fooos-0.1+3-0.efi is semantically the same
as fooos-0.1.efi, except that there are 3 attempts left and 0 done
yet. On each boot attempt the boot loader decreases x and increases
y. i.e. fooos-0.1+3-0.efi → fooos-0.1+2-1.efi → fooos-0.1+1-2.efi →
fooos-0.1+0-3.efi. If a boot succeeds the two counters are dropped
from the filename, i.e. → fooos-0.1.efi.

For details see: https://systemd.io/AUTOMATIC_BOOT_ASSESSMENT.

Now, why am I mentioning all this? Right now this assessment counter
logic is only implemented for the OS versioning as implemented by
sd-boot. But I think it would make a ton of sense to implement the
same scheme for the GPT partition table OS versioning, and then also
for the fs-level OS versioning as proposed in this thread.

Or to say this explicitly: we could define the spec to say that if
we encounter:

    /@auto/root-x86-64:fedora_36.0+3-0

on first boot attempt we'd rename it:

    /@auto/root-x86-64:fedora_36.0+2-1

and so on. Until boot succeeds in which case we'd rename it:

    /@auto/root-x86-64:fedora_36.0

i.e. we'd drop the counting suffix.

Could we have this automatic versioning scheme extended also to service 
RootImages & RootDirectories as well? If the automatic versioning was 
also extended to services, we could have A/B testing also for RootImages 
with automatic fallback to last known good working version.

In my setup, all services use either a RootImage= or RootDirectory= (for 
early boot services). Most of them don't care about kernel version, so 
the services use a shared drop-in (LVM logical volume 'levy'):

[Service]
RootImage=/dev/levy/%p-all.squashfs

The device path will then be for example 
/dev/levy/systemd-networkd-all.squashfs.

For udev and systemd-modules, kernel version is used 
(/usr/local/lib/rootimages/systemd-udevd-5.14.0-2-amd64.dir), so the 
services use this drop-in:

[Service]
RootDirectory=/usr/local/lib/rootimages/%p-%v.dir

Instead of (or in addition to) /@auto/ paths inside the RootImage= / 
RootDirectory=, the version could be available as modifier to part of 
device or directory pathname, for example:

[Service]
RootImage=/dev/levy/%p-all-@auto.squashfs

or

[Service]
RootImage=/usr/local/lib/rootimages/%p-%v-@auto.squashfs

Maybe %a instead of @auto.

This would then match 
/dev/levy/systemd-networkd-all-2021-11.09.0.squashfs as the highest 
version, but if that refuses to start, PID1 would try to start 
/dev/levy/systemd-networkd-all-2021-11.08.2.squashfs instead.

-Topi