On Thu, 2 Jan 2020 at 15:46, Laszlo Ersek <lersek@xxxxxxxxxx> wrote: > > On 12/20/19 20:41, Arvind Sankar wrote: > > On Fri, Dec 20, 2019 at 10:11:00AM +0200, Ard Biesheuvel wrote: > >> On Fri, 20 Dec 2019 at 09:17, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote: > >>> > >>> > >>> > >>>> On Dec 20, 2019, at 3:07 PM, Ard Biesheuvel <ard.biesheuvel@xxxxxxxxxx> wrote: > >>>> > >>>> On Thu, 19 Dec 2019 at 22:05, Matthew Garrett <mjg59@xxxxxxxxxx> wrote: > >>>>> > >>>>>> On Wed, Dec 18, 2019 at 9:03 AM Ard Biesheuvel <ardb@xxxxxxxxxx> wrote: > >>>>>> + status = efi_call_early(create_event, EVT_SIGNAL_EXIT_BOOT_SERVICES, > >>>>>> + TPL_CALLBACK, handle_exit_boot_services_event, > >>>>>> + NULL, &exit_boot_services_event); > >>>>>> + if (status != EFI_SUCCESS) { > >>>>>> + pr_efi_err("Failed to register for EBS() event\n"); > >>>>>> + goto free_handle; > >>>>>> + } > >>>>> > >>>>> OVMF's SEV implementation appears to tear down AMD's IOMMU mappings at > >>>>> EVT_SIGNAL_EXIT_BOOT_SERVICES. How are we ensuring that this happens > >>>>> first? > >>>> > >>>> It doesn't, and that is kind of the point. The only guarantee you have > >>>> is that this runs before ExitBootServices() returns, but after any > >>>> other callbacks that have been registered. I know this is not 100% > >>>> what you're after, but it is the only way we can avoid poking devices > >>>> behind the backs of their drivers. > >>>> > > > > Could we add a comment to describe why we have the two-step event > > notification, i.e. to ensure the ordering? > > > > Regarding SEV, from what Laszlo had said [1] I had understood that the > > SEV driver left everything blacklisted -- is this necessary at all for > > SEV or did I misunderstand his comment? > > > > [1] https://lore.kernel.org/lkml/9c58f2d2-5712-0972-6ea7-092500f37cf9@xxxxxxxxxx/ > > Leaving all guest RAM encrypted after EBS may not be a technical > requirement for SEV. It is a security goal, however. The OS should > inherit the most secure environment possible from the firmware, and not > have to look for pages that were left unencrypted. > > >>> Can you clarify (in the changelog or a comment perhaps) why you’re doing this instead of turning off busmastering before calling ExitBootServices()? Maybe this was covered in this thread, but I missed it. > >>> > >> > >> Sure. The problem is that EBS() is the place where drivers tear down > >> rings etc and gracefully take down the device. So killing DMA for all > >> of them by clearing the BM bit of every bridge is likely to cause > >> problems, because the teardown code wasn't written with the idea in > >> mind that DMA is no longer possible. On arm64 at least, it may result > >> in the kernel being entered with a pending SError which will kill the > >> OS as soon as they are unmasked. But the UEFI drivers themselves may > >> simply hang or timeout on some DMA access. > >> > > > > As I understand it, the order in which we want the bus-mastering > > disable to happen is > > - all PCI device drivers are stopped > > - bus mastering is disabled > > - depending on firmware, iommu might get disabled (probably out of our > > control) > > > > Instead of using the event notifier, could we not explicitly call > > DisconnectController() for all the PCI devices, then disable > > bus-mastering, and only then call ExitBootServices()? > > This sounds worth investigating to me. > I implemented this here: https://lore.kernel.org/linux-efi/20191231162344.130654-1-ardb@xxxxxxxxxx/ It works on all the systems I tried, including a mixed mode iMac 24" and a mixed mode Intel Atom based tablet kindly provided by Hans. The only thing to keep in mind is that some drivers implement Gop->Blt() using DMA in some cases, and so we should ensure that this is really the last thing that happens before ExitBootServices() is called, since a simple efi_printk() may trigger a DMA access by the graphics hardware. > >>> Also, surely this whole mess is a a design error in EFI, at least when SEV is involved, and there should be an EFI extension to keep IOMMU enabled. Or a specified way to *guarantee* that DMA is off when we exit boot services without hackery. > >> > >> The UEFI spec requires that all devices stop doing DMA at > >> ExitBootServices() time, which is why they register for this event and > >> disable DMA in their drivers. So from this pov, the state of the IOMMU > >> is irrelevant since no device should be doing DMA anyway. The UEFI > >> spec does not reason at all about IOMMUs. > >> > >> But indeed, the whole idea that the IOMMU is a 'feature' that you can > >> ignore if you want since it will be in passthrough mode otherwise is > >> misguided, but I'm not sure this is a firmware problem. The desire to > >> be able to run yesterday's OS on today's hardware, especially in the > >> Windows world, resulted in a situation where many security and > >> hardening features are opt-in rather than opt-out, with the adoption > >> you might expect. > > > > Looking at the intel iommu driver [2] there is a PCD entry to control > > whether to leave the IOMMU enabled. Is it possible to check its value to > > see whether the bus-master disabling is necessary -- ie IOMMU was > > enabled during boot and its going to get turned off during > > ExitBootServices, or even check whether it's settable and if so set it > > to leave the IOMMU enabled? > > > > [2] https://github.com/tianocore/edk2-platforms/blob/0d4d661b5a7cf3114a7d81e1c59e5cb57ceaf139/Silicon/Intel/IntelSiliconPkg/Feature/VTd/IntelVTdDxe/DmaProtection.c#L601 > > > > The platform config database is a PI concept, not a UEFI concept. IOW, > EFI_PCD_PROTOCOL is specified in the Platform Init spec, and not the > UEFI spec. A UEFI OS should only rely on the UEFI spec. > Agreed. And on top of that, PCDs can be FixedAtBuild or Patchable, in which case they cannot be controlled via the EFI_PCD_PROTOCOL.