On Mo, 08.11.21 14:24, Ludwig Nussel (ludwig.nussel@xxxxxxx) wrote: > Lennart Poettering wrote: > > [...] > > 3. Inside the "@auto" dir of the "super-root" fs, have dirs named > > <type>[:<namewithversion>]. The type should have a similar vocubulary > > as the GPT spec type UUIDs, but probably use textual identifiers > > rater than UUIDs, simply because naming dirs by uuids is > > weird. Examples: > > > > /@auto/root-x86-64:fedora_36.0/ > > /@auto/root-x86-64:fedora_36.1/ > > /@auto/root-x86-64:fedora_37.1/ > > /@auto/home/ > > /@auto/srv/ > > /@auto/tmp/ > > > > Which would be assembled by the initrd into the following via bind > > mounts: > > > > / → /@auto/root-x86-64:fedora_37.1/ > > /home/ → /@auto/home/ > > /srv/ → /@auto/srv/ > > /var/tmp/ → /@auto/tmp/ > > > > If we do this, then we should also leave the door open so that maybe > > ostree can be hooked up with this, i.e. if we allow the dirs in > > /@auto/ to actually be symlinks, then they could put their ostree > > checkotus wherever they want and then create a symlink > > /@auto/root-x86-64:myostreeos pointing to it, and their image would be > > spec conformant: we'd boot into that automatically, and so would > > nspawn and similar things. Thus they could switch their default OS to > > boot into without patching kernel cmdlines or such, simply by updating > > that symlink, and vanille systemd would know how to rearrange things. > > MicroOS has a similar situation. It edits /etc/fstab. microoos is a suse thing? > Anyway in the above example I guess if you install some updates you'd > get eg root-x86-64:fedora_37.2, .3, .4 etc? Well, the spec wouldn't mandate that. But yeah, the idea is that you could do it like that if you want. What's important is to define the vocabulary to make this easy and possible, but of course, whether people follow such an update scheme is up to them. I mean, it's the same as with the GPT auto discovery logic: it already implements such a versioning scheme because its easy to implement, but if you don't want to take benefit of the versioning, then don't, it's fine regardless. the logic we'd define here is about *consuming* available OS root filesystems, not about *installing* them, after all. The GPT auto-discovery thing basically does an strverscmp() on the full GPT partition label string, i.e. it does not attempt to split a name from a version, but assumes strverscmp() will handle a common prefix nicely anyway. I'd do it the exact same way here: if there are multiple options, then pick the newest as per strverscmp(), but that also means it's totally fine to not version your stuff and instead of calling it "root-x86-64:fedora_37.3" could could also just name it "root-x86-64:fedora" if you like, and then not have any versioning. > I suppose the autodetection is meant to boot the one sorted last. What > if that one turns out to be bad though? How to express rollback in that > model? Besides the GPT auto-discovery where versioning is implemented the way I mentioned, there's also the sd-boot boot loader which does roughly the same kind of OS versioning with the boot entries it discovers. So right now, you can already chose whether: 1. you want to do OS versioning on the boot loader entry level: name your EFI binary fooos-0.1.efi (or fooos-0.1.conf, as defined by the boot loader spec) and similar and the boot loader automatically picks it up, makes sense of it and boots the newest version installed. 2. you want to do OS versioning on the GPT partition table level: name your partitions "fooos-0.1" and similar, with the right GPT type, and tools such as systemd-nspawn, systemd-dissect, portable services, RootImage= in service unit files all will be able to automatically pick the newest version of the OS among the ones in the image. and now: 3. If we implement what I proprose above then you could do OS version on the file system level too. (Or you could do a combination of the above, if you want — which is highly desirable I think in case you want a universal image that can boot on bare metal and in nspawn in a nice versioned way.) Now, in sd-boot's versioning logic we implement an automatic boot assesment logic on top of the OS versioning: if you add a "+x-y" string into the boot entry name we use it as x=tries-left and y=tries-done counters. i.e. fooos-0.1+3-0.efi is semantically the same as fooos-0.1.efi, except that there are 3 attempts left and 0 done yet. On each boot attempt the boot loader decreases x and increases y. i.e. fooos-0.1+3-0.efi → fooos-0.1+2-1.efi → fooos-0.1+1-2.efi → fooos-0.1+0-3.efi. If a boot succeeds the two counters are dropped from the filename, i.e. → fooos-0.1.efi. For details see: https://systemd.io/AUTOMATIC_BOOT_ASSESSMENT. Now, why am I mentioning all this? Right now this assessment counter logic is only implemented for the OS versioning as implemented by sd-boot. But I think it would make a ton of sense to implement the same scheme for the GPT partition table OS versioning, and then also for the fs-level OS versioning as proposed in this thread. Or to say this explicitly: we could define the spec to say that if we encounter: /@auto/root-x86-64:fedora_36.0+3-0 on first boot attempt we'd rename it: /@auto/root-x86-64:fedora_36.0+2-1 and so on. Until boot succeeds in which case we'd rename it: /@auto/root-x86-64:fedora_36.0 i.e. we'd drop the counting suffix. That would be a very simple, yet powerful mechanism for assessment + versioning that is a nice "add on": an entry assessed as "good" is identical to one where assessment was never enabled. Thus people can ignore assessment if they want, the spec would cover that nicely. They could also ignore versioning if the want, the spec would cover that very nicely and naturally too. Why stick the counting into the dirname? Robustness mostly: if we put it there it's so closely attached to the entry that it never can get out of sync: we don't have to be afraid of metadata that gets out of sync, or not deleted when the entry is and so on. The object and the metadata are glued together in the tightest way possible. Moreover file renames are conceptually (if not necessarily in the implementation) "atomic" operations: you can count down in a single syscall, that can either work or fail, but typically not fail partially. Lennart -- Lennart Poettering, Berlin