Re: Btrfs in Silverblue

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jul 14, 2020 at 3:39 AM Lennart Poettering <mzerqung@xxxxxxxxxxx> wrote:
>
> On Mo, 13.07.20 19:07, Chris Murphy (lists@xxxxxxxxxxxxxxxxx) wrote:
>
> > On Mon, Jul 13, 2020 at 12:14 PM Lennart Poettering
> > <mzerqung@xxxxxxxxxxx> wrote:
> >
> > > Quite frankly, I don't see why the boot loader should care about the
> > > btrfs subvolume the initrd later picks at all.
> >
> > As far as I'm aware, rootflags= is a kernel boot parameter, and it
> > informs the kernel of mount options for the file system defined by the
> > root= boot parameter. Neither are initrd related. None of Btrfs
> > options or assembly are done by initrd or dracut magic.
>
> No, this is not how this works on Fedora or any other modern distro.
>
> The initrd parses root= and rootflags=. Then waits with udev until the
> device that is specified with root= shows up (much of the syntax that
> root= accepts is actually defined by libblkid/udev, not the
> kernel). As soon as the device shows up the initrd mounts the file
> system, passing the mount opts from rootflags= to the kernel's mount()
> call.
>
> On Fedora/dracut this is mostly implemented in systemd itself,
> i.e. the "systemd-fstab-generator" parses root=/rootflags= and a
> couple of other things and then generates .mount units from that.
>
> IIRC suse also uses dracut, hence there it's the same.
>
> Yes, the kernel also supports an initrd-less mode, where it's the
> kernel itself that parses root=/rootflags=, but we don't use that in
> Fedora (and because btrfs.ko is a module on Fedora can't be done at
> all if your rootfs is btrfs). In this mode the syntax of root= is a
> lot simpler, as the kernel itself doesn't grok all syntaxes we support
> in libblkid/udev.
>
> So yes, root=/rootflags= is territory of udev/dracut/systemd on
> Fedora. On Fedora the kernel does *not* parse that itself. The kernel
> will ultimately get the parsed data passed back in, via the mount()
> syscall, but that's all.
>
> > In the single existing example of a distribution using btrfs default
> > subvolumes, (open)SUSE, the bootloader automatically discovers the
> > read only snapshots, and understands how to do rollback by specifying
> > the proper /boot and / snapshot pairing.
>
> Are you sure?
>
> > Why at boot time? Well if your default subvolume contains a recent
> > update that for some reason renders it unbootable, it might be nice to
> > be able to pick a prior snapshot. That's how they do this. It isn't
> > how we have to do it, but that's the example that we know works
> > because it's actually designed, planned, implemented and maintained.
>
> Nah, this kind of selection you do in the initrd, not in the boot loader.
>

Yes. SUSE does this at the bootloader time. A *huge* part of the code
they added to GRUB is to automatically discover all this and populate
the menu and boot selection. None of this code exists in the upstream
GRUB codebase. This is done so that when snapper automatically creates
snapshots, it's created and stored in a structure that GRUB knows to
traverse and enumerate for the menu. It *is* GRUB dependent, but in
practice, this is fine since they only use GRUB. And for that matter,
that's mostly true for us too...

I am growing to understand that this is increasingly fundamentally
incompatible with the way Red Hat has been going with this. In the Red
Hat world with the Bootloader Spec, this *has* to be managed at the
initramfs/OS level and explicitly generated and configured. Things
like Boom (remember that?) are supposed to generate bootloader entry
configuration snippets when OS snapshots are made.

One part of the reason why I'm *not* pushing for automatic snapshots
for all this right now is because I seriously have no idea how I'm
going to do this in a BLS-friendly way yet.

> > > > And it is currently. I don't see how it's any more safe and robust by
> > > > eliminating only rootflags, but not the root parameter.
> > >
> > > You don't need the root parameter, actually. systemd automatically
> > > finds the root fs for you, it can do that if the root fs is properly
> > > advertised via the GPT partition type, according to the Discoverable
> > > Partition Spec.
> > >
> > > https://systemd.io/DISCOVERABLE_PARTITIONS
> >
> > Ok but my home, server and variable data are all on the same volume,
> > each in their own subvolume. Why does this file system get the Root
> > partition type GUID and not the home, server, or var partition type
> > GUIDs?
>
> There's a hierarchy here: You put the root of the file system tree in
> some partition, and then if you decide to split off some subtree of
> it, that subtree will get its own fs with its own partition type
> UUID, but only then.
>
> i.e. if /home is split out into its own fs, then you give it the home
> partition uuid type, but if not, then it is just part of the root
> partition (as that is the immedate parent directory) and thus sits in
> the same partition as the root fs.
>
> That all said, it might make sense to define a new partition type uuid
> for "btrfs-all-in-one" file systems, because for that it might make
> sense to even allow multiple OS archs in the same btrfs file system,
> e.g. have a subvol /_root.fedora32.x86_64 for the x86_64 version of
> the OS, but /_root.fedora32.arm64 for the ARM version and then have a
> single fs that can be booted on either arch.
>
> The current GPT partition types for the root fs are defined for each
> arch separately, so that you can have a single GPT disk image with one
> root fs partition for each arch, but that doesn't fit anymore if
> suddenly one of the root fs can hold the data for multiple archs,
> because its btrfs with multiple subvols.
>

It would be good to develop a scheme for this sort of thing and see if
we can also get openSUSE on board too, especially in time for SLE 16.

> > I regularly install multiple Fedora's, each in their own subvolume, on
> > a single Btrfs file system. That's way easier and much more space
> > efficient to deal with than using separate partitions and file systems
> > for each one. So you're saying, that's not a valid enough use case,
> > just use separate file systems for that?
>
> Well, as a matter of fact OSes like to fully own stuff, i.e. the boot
> loader, a file system and so on. If you want them to cooperate extra
> work needs to be done, i.e. specs written so that they don't fight for
> ownership but cooperate. For boot loaders the Boot Loader Spec can do
> that. But if you actually want to go as far sharing the same fs
> between multiple OSes, then you better have a very good spec in place
> that clarifies how they are supposed to cooperate and name stuff so
> that they don't constantly step on each other's toes.
>
> > > xattrs? That sounds unnecessary. I think the easiest would be to just
> > > operate on subvoumes that are named a certain way. For example, we
> > > could say, if the generator finds a set of subvolumes called
> > > "/_home.<something>" on the root fs, then it would sort them by name, and
> > > pick the last one of it, and automatically synthesize a .mount unit
> > > that mounts it to /home. And similar for other relevant dirs. That
> > > way, if you want to opt into this simple logic, just name your subvols
> > > /_home.xyz and there you go. The suffix you can then use for versioning
> > > or so, if you like, but we wouldn't care in the generator how you
> > > actually make use of it.
> >
> > To discover the names of subvolumes you have to mount the file system
> > first. So you're thinking:
> >
> > 1. The specific file system to be mounted as sysroot is the one with
> > the Root partition type GUID as defined in discoverable partitions
> > spec.
>
> Yes.
>
> > 2. The specific subvolume that is mounted when doing (1) is the
> > default subvolume defined by 'btrfs subvolume set-default'
>
> Yes.
>
> > 3. Mount can thus happen without root= or rootflags= and can happen on
> > the first mount attempt
>
> Yes. (You could use rootflags= however to mount a different subvol
> than the default one if you like, for example to switch to a different version)
>
> > 4. Upon mount a full subvolume listing is possible by
> > BTRFS_IOC_TREE_SEARCH + BTRFS_IOC_INO_LOOKUP
>
> Yes.
>
> > 5. Now you can mount home, var, srv, subvolumes per a schema
>
> Yes.
>
> > This is different in detail, but not altogether different from
> > http://0pointer.net/blog/revisiting-how-we-put-together-linux-systems.html
>
> Well, that's very old, I would't do it like that anymore. But yeah,
> there are ideas there that are related.
>
> > And that's a fine idea. But all of this directly involves a design and
> > plan for a snapshot and rollback regime. That's not happening for
> > Fedora 33. There's enough on our plate right now.
>
> All I am asking for is to make this simple and robust and forward
> looking enough so that we can later add something like the generator I
> proposed without having to rerrange anything. i.e. make the most basic
> stuff self-describing now, even if the automatic discovering/mounting
> of other subvols doesn't happen today, or even automatic snapshotting.
>
> By doing that correctly now, you can easily extend things later
> incrementally without breaking stuff, just by *adding* stuff. And you
> gain immediate compat with "systemd-nspawn --image=" right-away as the
> basic minimum, which already is great.
>

I would love to do that now, but right now I want to make sure
everything *works* before we jumble up the scheme we use to set up the
subvolumes.



-- 
真実はいつも一つ!/ Always, there's only one truth!
_______________________________________________
devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Fedora Announce]     [Fedora Users]     [Fedora Kernel]     [Fedora Testing]     [Fedora Formulas]     [Fedora PHP Devel]     [Kernel Development]     [Fedora Legacy]     [Fedora Maintainers]     [Fedora Desktop]     [PAM]     [Red Hat Development]     [Gimp]     [Yosemite News]

  Powered by Linux