Re: Btrfs in Silverblue

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mo, 13.07.20 19:07, Chris Murphy (lists@xxxxxxxxxxxxxxxxx) wrote:

> On Mon, Jul 13, 2020 at 12:14 PM Lennart Poettering
> <mzerqung@xxxxxxxxxxx> wrote:
>
> > Quite frankly, I don't see why the boot loader should care about the
> > btrfs subvolume the initrd later picks at all.
>
> As far as I'm aware, rootflags= is a kernel boot parameter, and it
> informs the kernel of mount options for the file system defined by the
> root= boot parameter. Neither are initrd related. None of Btrfs
> options or assembly are done by initrd or dracut magic.

No, this is not how this works on Fedora or any other modern distro.

The initrd parses root= and rootflags=. Then waits with udev until the
device that is specified with root= shows up (much of the syntax that
root= accepts is actually defined by libblkid/udev, not the
kernel). As soon as the device shows up the initrd mounts the file
system, passing the mount opts from rootflags= to the kernel's mount()
call.

On Fedora/dracut this is mostly implemented in systemd itself,
i.e. the "systemd-fstab-generator" parses root=/rootflags= and a
couple of other things and then generates .mount units from that.

IIRC suse also uses dracut, hence there it's the same.

Yes, the kernel also supports an initrd-less mode, where it's the
kernel itself that parses root=/rootflags=, but we don't use that in
Fedora (and because btrfs.ko is a module on Fedora can't be done at
all if your rootfs is btrfs). In this mode the syntax of root= is a
lot simpler, as the kernel itself doesn't grok all syntaxes we support
in libblkid/udev.

So yes, root=/rootflags= is territory of udev/dracut/systemd on
Fedora. On Fedora the kernel does *not* parse that itself. The kernel
will ultimately get the parsed data passed back in, via the mount()
syscall, but that's all.

> In the single existing example of a distribution using btrfs default
> subvolumes, (open)SUSE, the bootloader automatically discovers the
> read only snapshots, and understands how to do rollback by specifying
> the proper /boot and / snapshot pairing.

Are you sure?

> Why at boot time? Well if your default subvolume contains a recent
> update that for some reason renders it unbootable, it might be nice to
> be able to pick a prior snapshot. That's how they do this. It isn't
> how we have to do it, but that's the example that we know works
> because it's actually designed, planned, implemented and maintained.

Nah, this kind of selection you do in the initrd, not in the boot loader.

> > > And it is currently. I don't see how it's any more safe and robust by
> > > eliminating only rootflags, but not the root parameter.
> >
> > You don't need the root parameter, actually. systemd automatically
> > finds the root fs for you, it can do that if the root fs is properly
> > advertised via the GPT partition type, according to the Discoverable
> > Partition Spec.
> >
> > https://systemd.io/DISCOVERABLE_PARTITIONS
>
> Ok but my home, server and variable data are all on the same volume,
> each in their own subvolume. Why does this file system get the Root
> partition type GUID and not the home, server, or var partition type
> GUIDs?

There's a hierarchy here: You put the root of the file system tree in
some partition, and then if you decide to split off some subtree of
it, that subtree will get its own fs with its own partition type
UUID, but only then.

i.e. if /home is split out into its own fs, then you give it the home
partition uuid type, but if not, then it is just part of the root
partition (as that is the immedate parent directory) and thus sits in
the same partition as the root fs.

That all said, it might make sense to define a new partition type uuid
for "btrfs-all-in-one" file systems, because for that it might make
sense to even allow multiple OS archs in the same btrfs file system,
e.g. have a subvol /_root.fedora32.x86_64 for the x86_64 version of
the OS, but /_root.fedora32.arm64 for the ARM version and then have a
single fs that can be booted on either arch.

The current GPT partition types for the root fs are defined for each
arch separately, so that you can have a single GPT disk image with one
root fs partition for each arch, but that doesn't fit anymore if
suddenly one of the root fs can hold the data for multiple archs,
because its btrfs with multiple subvols.

> I regularly install multiple Fedora's, each in their own subvolume, on
> a single Btrfs file system. That's way easier and much more space
> efficient to deal with than using separate partitions and file systems
> for each one. So you're saying, that's not a valid enough use case,
> just use separate file systems for that?

Well, as a matter of fact OSes like to fully own stuff, i.e. the boot
loader, a file system and so on. If you want them to cooperate extra
work needs to be done, i.e. specs written so that they don't fight for
ownership but cooperate. For boot loaders the Boot Loader Spec can do
that. But if you actually want to go as far sharing the same fs
between multiple OSes, then you better have a very good spec in place
that clarifies how they are supposed to cooperate and name stuff so
that they don't constantly step on each other's toes.

> > xattrs? That sounds unnecessary. I think the easiest would be to just
> > operate on subvoumes that are named a certain way. For example, we
> > could say, if the generator finds a set of subvolumes called
> > "/_home.<something>" on the root fs, then it would sort them by name, and
> > pick the last one of it, and automatically synthesize a .mount unit
> > that mounts it to /home. And similar for other relevant dirs. That
> > way, if you want to opt into this simple logic, just name your subvols
> > /_home.xyz and there you go. The suffix you can then use for versioning
> > or so, if you like, but we wouldn't care in the generator how you
> > actually make use of it.
>
> To discover the names of subvolumes you have to mount the file system
> first. So you're thinking:
>
> 1. The specific file system to be mounted as sysroot is the one with
> the Root partition type GUID as defined in discoverable partitions
> spec.

Yes.

> 2. The specific subvolume that is mounted when doing (1) is the
> default subvolume defined by 'btrfs subvolume set-default'

Yes.

> 3. Mount can thus happen without root= or rootflags= and can happen on
> the first mount attempt

Yes. (You could use rootflags= however to mount a different subvol
than the default one if you like, for example to switch to a different version)

> 4. Upon mount a full subvolume listing is possible by
> BTRFS_IOC_TREE_SEARCH + BTRFS_IOC_INO_LOOKUP

Yes.

> 5. Now you can mount home, var, srv, subvolumes per a schema

Yes.

> This is different in detail, but not altogether different from
> http://0pointer.net/blog/revisiting-how-we-put-together-linux-systems.html

Well, that's very old, I would't do it like that anymore. But yeah,
there are ideas there that are related.

> And that's a fine idea. But all of this directly involves a design and
> plan for a snapshot and rollback regime. That's not happening for
> Fedora 33. There's enough on our plate right now.

All I am asking for is to make this simple and robust and forward
looking enough so that we can later add something like the generator I
proposed without having to rerrange anything. i.e. make the most basic
stuff self-describing now, even if the automatic discovering/mounting
of other subvols doesn't happen today, or even automatic snapshotting.

By doing that correctly now, you can easily extend things later
incrementally without breaking stuff, just by *adding* stuff. And you
gain immediate compat with "systemd-nspawn --image=" right-away as the
basic minimum, which already is great.

Lennart

--
Lennart Poettering, Berlin
_______________________________________________
devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Fedora Announce]     [Fedora Users]     [Fedora Kernel]     [Fedora Testing]     [Fedora Formulas]     [Fedora PHP Devel]     [Kernel Development]     [Fedora Legacy]     [Fedora Maintainers]     [Fedora Desktop]     [PAM]     [Red Hat Development]     [Gimp]     [Yosemite News]

  Powered by Linux