On Wed, Aug 31, 2022 at 02:46:15PM +0200, Gerd Hoffmann wrote: > Hi, > > Here is a little patch series to kick off a discussion on pre-generated > initrd images and unified kernels. Lets start with a description of the > patches: > > Patch #1 adds a dracut config file, targeting virtual machines. Given > that most physical machines have either sata or nvme disks these days > it probably boots most physical systems too. > > Patch #2 adds a sub-package with an initrd image. > > Patch #3 adds a sub-package with an unified kernel. I was going to open a merge request in Pagure which just has one patch doing #1 & #3 at the same time, but since you've started the discussion here, I'll just point to my branch: https://src.fedoraproject.org/fork/berrange/rpms/kernel/commits/unified-kernel with latest commit at time of writing being: https://src.fedoraproject.org/fork/berrange/rpms/kernel/c/b055ea3932e48fff0dc73647cb0a62e26db13482?branch=unified-kernel And builds available at https://koji.fedoraproject.org/koji/taskinfo?taskID=91495327 https://copr.fedorainfracloud.org/coprs/berrange/efi-unified-kernel/builds/ Compared to the patch #3 Gerd included, I've made changes - Added support for signing of the EFI images create - Create both vmlinux-virt.efi and vilinuz-virt-verbose.efi the latter whose cmdline is tailored for debugging - Install %ghost files at /boot/efi/EFI/Linux, similar to how existing kernels %ghost /boot/, so that RPM can validate disk space availability prior to install > The goal is to move away from initrd images being generated on the > installed machine. They are generated while building the kernel package > instead. Main motivation for this move is to make the distro more > robust and more secure. More specifically with public clouds most of the big vendors have a so called "Trusted VM" option which exposes EFI and vTPM, and allows for remote attestation of the VM. Azure at least has the attestation service integrated into their portal. The RHEL images we provide for cloud today can be luacnhced in a "Trusted VM" setup, but there's no usable trust provided because the dynamically generated initrd and cmdline are impractical to attest to. This is further compounded by grub's practice of writing every single grub.conf statement into the PCRs, which effectively requires a grub simulator to validate. More recently clouds have started to work on "Confidential VMs", which have a fairly high level of overlap with "Trusted VMs" in terms of what needs to be done to attest the confidentiality of the boot process. So again we need to measure and attest thue initrd + cmdline, and any bootloader config. This is a long winded way of saying that the need for EFI unified kernel images is just one piece of the puzzle we're working on. To complement this we'll be looking for either having shim directly launch the kernel image, or request for sd-boot to be signed and use that, in both cases eliminating use of grub to simplify the meausrement+attestation problem significantly. It is also likely that we'll be looking to make use of things like the kernel IMA framework to measure+attest to various aspects of the cloud disk and OS state. > When shipping the initrd as rpm it is possible to check it with the > usual tools ('rpm --verify' for example). TPM measurements are much > more useful because it is possible to pre-calculate the PCR values for a > given kernel version. > > When shipping a unified kernel image (containing kernel, initrd, cmdline > and signature) we get the additional benefit that the initrd is covered > by the signature so secure boot will actually be secure. > > So, while unified kernels are clearly the better approach it is also the > one which needs some changes in various packages. For an initrd image > the hooks needed are in place thanks to CoreOS shipping initrd images > today. Opt-in by install the sub-rpm and everything JustWorks[tm]. > > To make unified kernels work smoothly a number of changes are needed > (beside the kernel rpm changes): > > (1) Add support for unified kernels to the kernel update scripts. > (/usr/lib/kernel/install.d/*). > > (2) Add boot loader support for unified kernel images: > (a) either switch to sd-boot which already supports this. > (b) or add support to grub2 (improve blscfg downstream patch). > > (3) Support /boot being vfat (depending on #2, sd-boot needs this). Technically in the cloud image scenario we don't need to especially care about /boot being a dedicated partition. We could do everything exclusively in the /boot/efi partition which is already vfat, and not bother creating any /boot partition, since we can ensure /boot/efi is large enough. If we forsee the unified EFI kenrels being useful for bare metal, however, then use of /boot as vfat becomes more important, as we can't assume the hardware vendor's pre-created /boot/efi is sufficiently large. > (4) Remove configuration information (and secrets) from initrd images > and kernel command line. > > Most important item here is root the filesystem location, which > should be doable using https://systemd.io/DISCOVERABLE_PARTITIONS/ > for many use cases. > > Can initially be handled in anaconda kickstart %post scripts. > Long-term we need proper support in anaconda (and any other tool > used to install or generate cloud images), especially if we want > make unified kernel images the default some day. > > (5) There might be more ... In the kernel.spec changes I link to earlier, I've actually proposed creating two distinct EFI images. Under SecureBoot, the users won't have the option to edit the cmdline, since it is embedded in the EFI image and measured. Furthermore with Confidential VMs, the emulated keyboard, serial port and VGA output are not trustworthy, so care has to be taken with any interactive process during boot, including any interaction with a boot loader. Selectinmg between multiple pre-defined kenrel entries is fine, editting cmdline is not viable. In normal operation this isn't a big issue, as its fine to just hardcode a quiet, graphical boot. When things go wrong, however, it is nice to be able to boot with 'debug' and rhgb turned off. Thus my patch proposed two images, to be distributed in the same 'kernel-virt-unified' sub-RPM. * vmlinuz-virt.efi created using dracut arg --kernel-cmdline 'console=ttyS0 console=tty0 quiet rhgb' * vmlinuz-virt-verbose.efi created using dracut arg --kernel-cmdline 'console=ttyS0 console=tty0' Even when following the system discoverable partitions spec, we need a mechanism to attest that the root filesystem that was discovered and mounted matches the one we expect. Our current thought is that this is likely to involve TPM PCR measurements being logged by systemd and libcryptsetup, as well as some of the kernel IMA providers. This is being discussed with systemd upstream at https://github.com/systemd/systemd/issues/24503 I raise this because for the kernel IMA support, we're likely to need to add further kernel cmdline parameters beyond those currently shown in these patches (ie ima=on at the very least), as well as bundling an /etc/ima/ima-policy file into the initrd. > I think the best way forward is to skip the initrd image interim step > and try go straight to unified kernel image support, starting with > virtual machines & cloud images, when things are working smoothly there > go expand to cover more use cases. I think it makes sense to start with > the kernel changes. > > Comments? Reviews? Suggestions? In terms of the distro maint burden, shipping these pre-built EFI images introduced new deps on the kernel build process BuildRequires: dracut BuildRequires: binutils BuildRequires: lvm2 In theory, any time one of those packages (or an existing kernel deps) has a change, it might impact the content that is bundled in the initrd that is prebuilt / bundled with the EFI image. That in turn could mean that extra kernel RPM re-builds are needed, simply because a 3rd party dep had an important change that we need to get into the initrd. In practice, my impression is that the kernel gets rebuilt frequently enough that the content bundled into the initrds is already going to keeping sufficiently updated. Even if some package change did impact the initrd content, it would almost always be possible for users to wait until the next normal kernel rebuild point to get that into the initrds. High priority CVEs feels like the only scenario that might force an extra kernel RPM rebuild that would not otherwise have been needed. So overall, I feel like this addition ought not to introduce a notable negative impact on kernel RPM maint, while at the same it is enables us to close the SecureBoot measurement hole our distro (and essentially every other Linux distro) has suffered with for years. FWIW, as a reference/comparison point, in testing out Azure's support for AMD SEV-SNP) confidential virtualization, we've learnt that Ubuntu have also chosen to go down the route of using a EFI unified kernel image, in their case, directly booted by shim with neither grub nor sd-boot involved. Their image looked like a one-off special, but it is likely that will apply going forwards, since there are few other practical ways to deal with the measurement & attestation needs in a seemless way. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| _______________________________________________ kernel mailing list -- kernel@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe send an email to kernel-leave@xxxxxxxxxxxxxxxxxxxxxxx Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/kernel@xxxxxxxxxxxxxxxxxxxxxxx Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue