Re: [PATCH 0/3] pre-generated initrd and unified kernels

Daniel P. Berrangé <berrange@xxxxxxxxxx> · Thu, 1 Sep 2022 15:21:42 +0100

On Wed, Aug 31, 2022 at 02:46:15PM +0200, Gerd Hoffmann wrote:
>   Hi,
> 
> Here is a little patch series to kick off a discussion on pre-generated
> initrd images and unified kernels.  Lets start with a description of the
> patches:
> 
>   Patch #1 adds a dracut config file, targeting virtual machines.  Given
>   that most physical machines have either sata or nvme disks these days
>   it probably boots most physical systems too.
> 
>   Patch #2 adds a sub-package with an initrd image.
> 
>   Patch #3 adds a sub-package with an unified kernel.

I was going to open a merge request in Pagure which just has one
patch doing #1 & #3 at the same time, but since you've started the
discussion here, I'll just point to my branch:

  https://src.fedoraproject.org/fork/berrange/rpms/kernel/commits/unified-kernel

with latest commit at time of writing being:

  https://src.fedoraproject.org/fork/berrange/rpms/kernel/c/b055ea3932e48fff0dc73647cb0a62e26db13482?branch=unified-kernel

And builds available at

  https://koji.fedoraproject.org/koji/taskinfo?taskID=91495327
  https://copr.fedorainfracloud.org/coprs/berrange/efi-unified-kernel/builds/

Compared to the patch #3 Gerd included, I've made changes

  - Added support for signing of the EFI images create

  - Create both vmlinux-virt.efi and vilinuz-virt-verbose.efi
    the latter whose cmdline is tailored for debugging

  - Install %ghost files at /boot/efi/EFI/Linux, similar to
    how existing kernels %ghost /boot/, so that RPM can
    validate disk space  availability prior to install

> The goal is to move away from initrd images being generated on the
> installed machine.  They are generated while building the kernel package
> instead.  Main motivation for this move is to make the distro more
> robust and more secure.

More specifically with public clouds most of the big vendors
have a so called "Trusted VM" option which exposes EFI and vTPM,
and allows for remote attestation of the VM. Azure at least has
the attestation service integrated into their portal.

The RHEL images we provide for cloud today can be luacnhced in a
"Trusted VM" setup, but there's no usable trust provided because
the dynamically generated initrd and cmdline are impractical to
attest to. This is further compounded by grub's practice of
writing every single grub.conf statement into the PCRs, which
effectively requires a grub simulator to validate.

More recently clouds have started to work on "Confidential VMs",
which have a fairly high level of overlap with "Trusted VMs" in
terms of what needs to be done to attest the confidentiality of
the boot process. So again we need to measure and attest thue
initrd + cmdline, and any bootloader config.

This is a long winded way of saying that the need for EFI unified
kernel images is just one piece of the puzzle we're working on.
To complement this we'll be looking for either having shim directly
launch the kernel image, or request for sd-boot to be signed and
use that, in both cases eliminating use of grub to simplify the
meausrement+attestation problem significantly.

It is also likely that we'll be looking to make use of things like
the kernel IMA framework to measure+attest to various aspects of
the cloud disk and OS state.

> When shipping the initrd as rpm it is possible to check it with the
> usual tools ('rpm --verify' for example).  TPM measurements are much
> more useful because it is possible to pre-calculate the PCR values for a
> given kernel version.
> 
> When shipping a unified kernel image (containing kernel, initrd, cmdline
> and signature) we get the additional benefit that the initrd is covered
> by the signature so secure boot will actually be secure.
> 
> So, while unified kernels are clearly the better approach it is also the
> one which needs some changes in various packages.  For an initrd image
> the hooks needed are in place thanks to CoreOS shipping initrd images
> today.  Opt-in by install the sub-rpm and everything JustWorks[tm].
> 
> To make unified kernels work smoothly a number of changes are needed
> (beside the kernel rpm changes):
> 
> (1) Add support for unified kernels to the kernel update scripts.
>     (/usr/lib/kernel/install.d/*).
> 
> (2) Add boot loader support for unified kernel images:
>     (a) either switch to sd-boot which already supports this.
>     (b) or add support to grub2 (improve blscfg downstream patch).
> 
> (3) Support /boot being vfat (depending on #2, sd-boot needs this).

Technically in the cloud image scenario we don't need to especially
care about /boot being a dedicated partition. We could do everything
exclusively in the /boot/efi partition which is already vfat, and not
bother creating any /boot partition, since we can ensure /boot/efi is
large enough.

If we forsee the unified EFI kenrels being useful for bare metal,
however, then use of /boot as vfat becomes more important, as we
can't assume the hardware vendor's pre-created /boot/efi is
sufficiently large.

> (4) Remove configuration information (and secrets) from initrd images
>     and kernel command line.
> 
>     Most important item here is root the filesystem location, which
>     should be doable using https://systemd.io/DISCOVERABLE_PARTITIONS/
>     for many use cases.
> 
>     Can initially be handled in anaconda kickstart %post scripts.
>     Long-term we need proper support in anaconda (and any other tool
>     used to install or generate cloud images), especially if we want
>     make unified kernel images the default some day.
> 
> (5) There might be more ...

In the kernel.spec changes I link to earlier, I've actually proposed
creating two distinct EFI images. Under SecureBoot, the users won't
have the option to edit the cmdline, since it is embedded in the
EFI image and measured. Furthermore with Confidential VMs, the
emulated keyboard, serial port and VGA output are not trustworthy,
so care has to be taken with any interactive process during boot,
including any interaction with a boot loader. Selectinmg between
multiple pre-defined kenrel entries is fine, editting cmdline is
not viable.

In normal operation this isn't a big issue, as its fine to just
hardcode a quiet, graphical boot. When things go wrong, however,
it is nice to be able to boot with 'debug' and rhgb turned off.

Thus my patch proposed two images, to be distributed in the same
'kernel-virt-unified' sub-RPM.

 * vmlinuz-virt.efi  created using dracut arg

     --kernel-cmdline 'console=ttyS0 console=tty0 quiet rhgb'

 * vmlinuz-virt-verbose.efi created using dracut arg

     --kernel-cmdline 'console=ttyS0 console=tty0'

Even when following the system discoverable partitions spec,
we need a mechanism to attest that the root filesystem that
was discovered and mounted matches the one we expect. Our
current thought is that this is likely to involve TPM PCR
measurements being logged by systemd and libcryptsetup, as
well as some of the kernel IMA providers. This is being
discussed with systemd upstream at

  https://github.com/systemd/systemd/issues/24503

I raise this because for the kernel IMA support, we're likely
to need to add further kernel cmdline parameters beyond those
currently shown in these patches (ie ima=on at the very least),
as well as bundling an /etc/ima/ima-policy file into the initrd.

> I think the best way forward is to skip the initrd image interim step
> and try go straight to unified kernel image support, starting with
> virtual machines & cloud images, when things are working smoothly there
> go expand to cover more use cases.  I think it makes sense to start with
> the kernel changes.
> 
> Comments?  Reviews?  Suggestions?

In terms of the distro maint burden, shipping these pre-built EFI images
introduced new deps on the kernel build process

  BuildRequires: dracut
  BuildRequires: binutils
  BuildRequires: lvm2

In theory, any time one of those packages (or an existing kernel deps)
has a change, it might impact the content that is bundled in the initrd
that is prebuilt / bundled with the EFI image. That in turn could mean
that extra kernel RPM re-builds are needed, simply because a 3rd party
dep had an important change that we need to get into the initrd.

In practice, my impression is that the kernel gets rebuilt frequently
enough that the content bundled into the initrds is already going to
keeping sufficiently updated. Even if some package change did impact
the initrd content, it would almost always be possible for users to
wait until the next normal kernel rebuild point to get that into the
initrds. High priority CVEs feels like the only scenario that might
force an extra kernel RPM rebuild that would not otherwise have been
needed.

So overall, I feel like this addition ought not to introduce a notable
negative impact on kernel RPM maint, while at the same it is enables us
to close the SecureBoot measurement hole our distro (and essentially
every other Linux distro) has suffered with for years.

FWIW, as a reference/comparison point, in testing out Azure's support
for AMD SEV-SNP) confidential virtualization, we've learnt that Ubuntu
have also chosen to go down the route of using a EFI unified kernel
image, in their case, directly booted by shim with neither grub nor
sd-boot involved. Their image looked like a one-off special, but it
is likely that will apply going forwards, since there are few other
practical ways to deal with the measurement & attestation needs in a
seemless way.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
_______________________________________________
kernel mailing list -- kernel@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to kernel-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/kernel@xxxxxxxxxxxxxxxxxxxxxxx
Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue