Re: the need for a discoverable sub-volumes specification

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mo, 08.11.21 14:24, Ludwig Nussel (ludwig.nussel@xxxxxxx) wrote:

> Lennart Poettering wrote:
> > [...]
> > 3. Inside the "@auto" dir of the "super-root" fs, have dirs named
> >    <type>[:<namewithversion>]. The type should have a similar vocubulary
> >    as the GPT spec type UUIDs, but probably use textual identifiers
> >    rater than UUIDs, simply because naming dirs by uuids is
> >    weird. Examples:
> >
> >    /@auto/root-x86-64:fedora_36.0/
> >    /@auto/root-x86-64:fedora_36.1/
> >    /@auto/root-x86-64:fedora_37.1/
> >    /@auto/home/
> >    /@auto/srv/
> >    /@auto/tmp/
> >
> >    Which would be assembled by the initrd into the following via bind
> >    mounts:
> >
> >    /         → /@auto/root-x86-64:fedora_37.1/
> >    /home/    → /@auto/home/
> >    /srv/     → /@auto/srv/
> >    /var/tmp/ → /@auto/tmp/
> >
> > If we do this, then we should also leave the door open so that maybe
> > ostree can be hooked up with this, i.e. if we allow the dirs in
> > /@auto/ to actually be symlinks, then they could put their ostree
> > checkotus wherever they want and then create a symlink
> > /@auto/root-x86-64:myostreeos pointing to it, and their image would be
> > spec conformant: we'd boot into that automatically, and so would
> > nspawn and similar things. Thus they could switch their default OS to
> > boot into without patching kernel cmdlines or such, simply by updating
> > that symlink, and vanille systemd would know how to rearrange things.
>
> MicroOS has a similar situation. It edits /etc/fstab.

microoos is a suse thing?

> Anyway in the above example I guess if you install some updates you'd
> get eg root-x86-64:fedora_37.2, .3, .4 etc?

Well, the spec wouldn't mandate that. But yeah, the idea is that you
could do it like that if you want. What's important is to define the
vocabulary to make this easy and possible, but of course, whether
people follow such an update scheme is up to them. I mean, it's the
same as with the GPT auto discovery logic: it already implements such
a versioning scheme because its easy to implement, but if you don't
want to take benefit of the versioning, then don't, it's fine
regardless. the logic we'd define here is about *consuming* available
OS root filesystems, not about *installing* them, after all.

The GPT auto-discovery thing basically does an strverscmp() on the
full GPT partition label string, i.e. it does not attempt to split a
name from a version, but assumes strverscmp() will handle a common
prefix nicely anyway. I'd do it the exact same way here: if there are
multiple options, then pick the newest as per strverscmp(), but that
also means it's totally fine to not version your stuff and instead of
calling it "root-x86-64:fedora_37.3" could could also just name it
"root-x86-64:fedora" if you like, and then not have any versioning.

> I suppose the autodetection is meant to boot the one sorted last. What
> if that one turns out to be bad though? How to express rollback in that
> model?

Besides the GPT auto-discovery where versioning is implemented the way
I mentioned, there's also the sd-boot boot loader which does roughly
the same kind of OS versioning with the boot entries it discovers. So
right now, you can already chose whether:

1. you want to do OS versioning on the boot loader entry level: name
   your EFI binary fooos-0.1.efi (or fooos-0.1.conf, as defined by the
   boot loader spec) and similar and the boot loader automatically
   picks it up, makes sense of it and boots the newest version
   installed.

2. you want to do OS versioning on the GPT partition table level: name
   your partitions "fooos-0.1" and similar, with the right GPT type,
   and tools such as systemd-nspawn, systemd-dissect, portable
   services, RootImage= in service unit files all will be able to
   automatically pick the newest version of the OS among the ones in
   the image.

and now:

3. If we implement what I proprose above then you could do OS version
   on the file system level too.

(Or you could do a combination of the above, if you want — which is
highly desirable I think in case you want a universal image that can
boot on bare metal and in nspawn in a nice versioned way.)

Now, in sd-boot's versioning logic we implement an automatic boot
assesment logic on top of the OS versioning: if you add a "+x-y"
string into the boot entry name we use it as x=tries-left and
y=tries-done counters. i.e. fooos-0.1+3-0.efi is semantically the same
as fooos-0.1.efi, except that there are 3 attempts left and 0 done
yet. On each boot attempt the boot loader decreases x and increases
y. i.e. fooos-0.1+3-0.efi → fooos-0.1+2-1.efi → fooos-0.1+1-2.efi →
fooos-0.1+0-3.efi. If a boot succeeds the two counters are dropped
from the filename, i.e. → fooos-0.1.efi.

For details see: https://systemd.io/AUTOMATIC_BOOT_ASSESSMENT.

Now, why am I mentioning all this? Right now this assessment counter
logic is only implemented for the OS versioning as implemented by
sd-boot. But I think it would make a ton of sense to implement the
same scheme for the GPT partition table OS versioning, and then also
for the fs-level OS versioning as proposed in this thread.

Or to say this explicitly: we could define the spec to say that if
we encounter:

   /@auto/root-x86-64:fedora_36.0+3-0

on first boot attempt we'd rename it:

   /@auto/root-x86-64:fedora_36.0+2-1

and so on. Until boot succeeds in which case we'd rename it:

   /@auto/root-x86-64:fedora_36.0

i.e. we'd drop the counting suffix.

That would be a very simple, yet powerful mechanism for assessment +
versioning that is a nice "add on": an entry assessed as "good" is
identical to one where assessment was never enabled. Thus people can
ignore assessment if they want, the spec would cover that nicely. They
could also ignore versioning if the want, the spec would cover that
very nicely and naturally too.

Why stick the counting into the dirname? Robustness mostly: if we put
it there it's so closely attached to the entry that it never can get
out of sync: we don't have to be afraid of metadata that gets out of
sync, or not deleted when the entry is and so on. The object and the
metadata are glued together in the tightest way possible. Moreover
file renames are conceptually (if not necessarily in the
implementation) "atomic" operations: you can count down in a single
syscall, that can either work or fail, but typically not fail
partially.

Lennart

--
Lennart Poettering, Berlin



[Index of Archives]     [LARTC]     [Bugtraq]     [Yosemite Forum]     [Photo]

  Powered by Linux