On 12/20/19 20:41, Arvind Sankar wrote: > On Fri, Dec 20, 2019 at 10:11:00AM +0200, Ard Biesheuvel wrote: >> On Fri, 20 Dec 2019 at 09:17, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote: >>> >>> >>> >>>> On Dec 20, 2019, at 3:07 PM, Ard Biesheuvel <ard.biesheuvel@xxxxxxxxxx> wrote: >>>> >>>> On Thu, 19 Dec 2019 at 22:05, Matthew Garrett <mjg59@xxxxxxxxxx> wrote: >>>>> >>>>>> On Wed, Dec 18, 2019 at 9:03 AM Ard Biesheuvel <ardb@xxxxxxxxxx> wrote: >>>>>> + status = efi_call_early(create_event, EVT_SIGNAL_EXIT_BOOT_SERVICES, >>>>>> + TPL_CALLBACK, handle_exit_boot_services_event, >>>>>> + NULL, &exit_boot_services_event); >>>>>> + if (status != EFI_SUCCESS) { >>>>>> + pr_efi_err("Failed to register for EBS() event\n"); >>>>>> + goto free_handle; >>>>>> + } >>>>> >>>>> OVMF's SEV implementation appears to tear down AMD's IOMMU mappings at >>>>> EVT_SIGNAL_EXIT_BOOT_SERVICES. How are we ensuring that this happens >>>>> first? >>>> >>>> It doesn't, and that is kind of the point. The only guarantee you have >>>> is that this runs before ExitBootServices() returns, but after any >>>> other callbacks that have been registered. I know this is not 100% >>>> what you're after, but it is the only way we can avoid poking devices >>>> behind the backs of their drivers. >>>> > > Could we add a comment to describe why we have the two-step event > notification, i.e. to ensure the ordering? > > Regarding SEV, from what Laszlo had said [1] I had understood that the > SEV driver left everything blacklisted -- is this necessary at all for > SEV or did I misunderstand his comment? > > [1] https://lore.kernel.org/lkml/9c58f2d2-5712-0972-6ea7-092500f37cf9@xxxxxxxxxx/ Leaving all guest RAM encrypted after EBS may not be a technical requirement for SEV. It is a security goal, however. The OS should inherit the most secure environment possible from the firmware, and not have to look for pages that were left unencrypted. >>> Can you clarify (in the changelog or a comment perhaps) why you’re doing this instead of turning off busmastering before calling ExitBootServices()? Maybe this was covered in this thread, but I missed it. >>> >> >> Sure. The problem is that EBS() is the place where drivers tear down >> rings etc and gracefully take down the device. So killing DMA for all >> of them by clearing the BM bit of every bridge is likely to cause >> problems, because the teardown code wasn't written with the idea in >> mind that DMA is no longer possible. On arm64 at least, it may result >> in the kernel being entered with a pending SError which will kill the >> OS as soon as they are unmasked. But the UEFI drivers themselves may >> simply hang or timeout on some DMA access. >> > > As I understand it, the order in which we want the bus-mastering > disable to happen is > - all PCI device drivers are stopped > - bus mastering is disabled > - depending on firmware, iommu might get disabled (probably out of our > control) > > Instead of using the event notifier, could we not explicitly call > DisconnectController() for all the PCI devices, then disable > bus-mastering, and only then call ExitBootServices()? This sounds worth investigating to me. >>> Also, surely this whole mess is a a design error in EFI, at least when SEV is involved, and there should be an EFI extension to keep IOMMU enabled. Or a specified way to *guarantee* that DMA is off when we exit boot services without hackery. >> >> The UEFI spec requires that all devices stop doing DMA at >> ExitBootServices() time, which is why they register for this event and >> disable DMA in their drivers. So from this pov, the state of the IOMMU >> is irrelevant since no device should be doing DMA anyway. The UEFI >> spec does not reason at all about IOMMUs. >> >> But indeed, the whole idea that the IOMMU is a 'feature' that you can >> ignore if you want since it will be in passthrough mode otherwise is >> misguided, but I'm not sure this is a firmware problem. The desire to >> be able to run yesterday's OS on today's hardware, especially in the >> Windows world, resulted in a situation where many security and >> hardening features are opt-in rather than opt-out, with the adoption >> you might expect. > > Looking at the intel iommu driver [2] there is a PCD entry to control > whether to leave the IOMMU enabled. Is it possible to check its value to > see whether the bus-master disabling is necessary -- ie IOMMU was > enabled during boot and its going to get turned off during > ExitBootServices, or even check whether it's settable and if so set it > to leave the IOMMU enabled? > > [2] https://github.com/tianocore/edk2-platforms/blob/0d4d661b5a7cf3114a7d81e1c59e5cb57ceaf139/Silicon/Intel/IntelSiliconPkg/Feature/VTd/IntelVTdDxe/DmaProtection.c#L601 > The platform config database is a PI concept, not a UEFI concept. IOW, EFI_PCD_PROTOCOL is specified in the Platform Init spec, and not the UEFI spec. A UEFI OS should only rely on the UEFI spec. Thanks Laszlo