On Fri, Dec 20, 2019 at 10:11:00AM +0200, Ard Biesheuvel wrote: > On Fri, 20 Dec 2019 at 09:17, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote: > > > > > > > > > On Dec 20, 2019, at 3:07 PM, Ard Biesheuvel <ard.biesheuvel@xxxxxxxxxx> wrote: > > > > > > On Thu, 19 Dec 2019 at 22:05, Matthew Garrett <mjg59@xxxxxxxxxx> wrote: > > >> > > >>> On Wed, Dec 18, 2019 at 9:03 AM Ard Biesheuvel <ardb@xxxxxxxxxx> wrote: > > >>> + status = efi_call_early(create_event, EVT_SIGNAL_EXIT_BOOT_SERVICES, > > >>> + TPL_CALLBACK, handle_exit_boot_services_event, > > >>> + NULL, &exit_boot_services_event); > > >>> + if (status != EFI_SUCCESS) { > > >>> + pr_efi_err("Failed to register for EBS() event\n"); > > >>> + goto free_handle; > > >>> + } > > >> > > >> OVMF's SEV implementation appears to tear down AMD's IOMMU mappings at > > >> EVT_SIGNAL_EXIT_BOOT_SERVICES. How are we ensuring that this happens > > >> first? > > > > > > It doesn't, and that is kind of the point. The only guarantee you have > > > is that this runs before ExitBootServices() returns, but after any > > > other callbacks that have been registered. I know this is not 100% > > > what you're after, but it is the only way we can avoid poking devices > > > behind the backs of their drivers. > > > Could we add a comment to describe why we have the two-step event notification, i.e. to ensure the ordering? Regarding SEV, from what Laszlo had said [1] I had understood that the SEV driver left everything blacklisted -- is this necessary at all for SEV or did I misunderstand his comment? [1] https://lore.kernel.org/lkml/9c58f2d2-5712-0972-6ea7-092500f37cf9@xxxxxxxxxx/ > > > > Can you clarify (in the changelog or a comment perhaps) why you’re doing this instead of turning off busmastering before calling ExitBootServices()? Maybe this was covered in this thread, but I missed it. > > > > Sure. The problem is that EBS() is the place where drivers tear down > rings etc and gracefully take down the device. So killing DMA for all > of them by clearing the BM bit of every bridge is likely to cause > problems, because the teardown code wasn't written with the idea in > mind that DMA is no longer possible. On arm64 at least, it may result > in the kernel being entered with a pending SError which will kill the > OS as soon as they are unmasked. But the UEFI drivers themselves may > simply hang or timeout on some DMA access. > As I understand it, the order in which we want the bus-mastering disable to happen is - all PCI device drivers are stopped - bus mastering is disabled - depending on firmware, iommu might get disabled (probably out of our control) Instead of using the event notifier, could we not explicitly call DisconnectController() for all the PCI devices, then disable bus-mastering, and only then call ExitBootServices()? > > Also, surely this whole mess is a a design error in EFI, at least when SEV is involved, and there should be an EFI extension to keep IOMMU enabled. Or a specified way to *guarantee* that DMA is off when we exit boot services without hackery. > > The UEFI spec requires that all devices stop doing DMA at > ExitBootServices() time, which is why they register for this event and > disable DMA in their drivers. So from this pov, the state of the IOMMU > is irrelevant since no device should be doing DMA anyway. The UEFI > spec does not reason at all about IOMMUs. > > But indeed, the whole idea that the IOMMU is a 'feature' that you can > ignore if you want since it will be in passthrough mode otherwise is > misguided, but I'm not sure this is a firmware problem. The desire to > be able to run yesterday's OS on today's hardware, especially in the > Windows world, resulted in a situation where many security and > hardening features are opt-in rather than opt-out, with the adoption > you might expect. Looking at the intel iommu driver [2] there is a PCD entry to control whether to leave the IOMMU enabled. Is it possible to check its value to see whether the bus-master disabling is necessary -- ie IOMMU was enabled during boot and its going to get turned off during ExitBootServices, or even check whether it's settable and if so set it to leave the IOMMU enabled? [2] https://github.com/tianocore/edk2-platforms/blob/0d4d661b5a7cf3114a7d81e1c59e5cb57ceaf139/Silicon/Intel/IntelSiliconPkg/Feature/VTd/IntelVTdDxe/DmaProtection.c#L601