[cc += Kees Cook, Jann Horn; start of thread: https://lore.kernel.org/all/6d4361f13a942efc4b4d33d22e56b564c4362328.1719771133.git.lukas@xxxxxxxxx/ ] On Thu, Jul 11, 2024 at 10:50:28AM -0700, Dan Williams wrote: > Lukas Wunner wrote: > > Resume is parallelized (see dpm_noirq_resume_devices()), so the latency > > is bounded by the time to authenticate a single device. > > As far as I understand that can still be on the order of seconds, and > pathological cases that could be longer. [...] > How bad is that latency problem in practice? I'm seeing 150 msec to authenticate a PCI device if the signature can't be verified (e.g. due to missing trusted root certificate) and 400 msec if the signature *is* verified. This varies depending on beefiness of CPU, algorithm selection, key length and number of provisioned slots. But I've never seen this take "on the order of seconds", I assume that's a misunderstanding. vmlinux size grows by 12.752 bytes with CONFIG_PCI_CMA=y on x86_64. The feature is disabled by default. > All of these are mitigated by pushing authentication management to > drivers. Device authentication can't be pushed to drivers. It must be done *before* driver binding: Drivers are bound based on identity information in config space (such as Vendor ID or Device ID). A malicious device could spoof identity information in config space to force binding to a specific (CMA-unaware) driver. The certificate contains the signed Vendor ID and Device ID of the device. By validating the certificate and the signature presented by the device, its identity can be ascertained by the PCI core before a driver (the right one) starts accessing it. > I see no justification for the hard coded aggressive default policy I think that just preventing driver binding if a device fails authentication may not be good enough. If a device is truly malicious, perhaps we should firewall it off. I'm worried about a device laterally attacking other devices through P2PDMA or sending malformed TLPs upstream to the root complex. In patch [11/18], I'm suggesting: "Traffic from devices which failed authentication could also be filtered through ACS I/O Request Blocking Enable (PCIe r6.2 sec 7.7.11.3) or through Link Disable (PCIe r6.2 sec 7.5.3.7)." To firewall off malicious devices, authentication should happen early on. The system shouldn't be exposed to those devices any longer than necessary. That's one reason why this patch set performs mandatory authentication already on enumeration: So that we're able to catch malicious devices as early as possible. Patch [08/18] inserts pci_cma_init() at the end of pci_init_capabilities() because CMA depends on DOE. We may want to move DOE and CMA init further up in the function to authenticate the device even before enumerating any of its other capabilities. It's probably too early to decide which actions to take if a device fails authentication, whether to offer a variety of actions (only prevent driver binding) or just stick to the harshest one (firewall off the device), when to perform those actions and which knobs to offer to users for controlling policy and overriding actions. We may need more real-world experience before we can make those decisions and we may need to ask security folks such as Kees Cook and Jann Horn for their perspective. This patch set merely exposes to user space whether a device passed authentication or not. For that alone, it would indeed be sufficient to authenticate asynchronously -- or delay authentication until the sysfs attribute is accessed. But I wanted to keep the option open to firewall off devices early on. And placing pci_cma_init() in pci_init_capabilities() felt natural because it's where all the other device capabilities are enumerated and initialized. Thanks, Lukas