Re: [PATCH v2 08/18] PCI/CMA: Authenticate devices on enumeration

Lukas Wunner <lukas@xxxxxxxxx> · Sun, 14 Jul 2024 10:42:41 +0200

[cc += Kees Cook, Jann Horn; start of thread:
https://lore.kernel.org/all/6d4361f13a942efc4b4d33d22e56b564c4362328.1719771133.git.lukas@xxxxxxxxx/
]

On Thu, Jul 11, 2024 at 10:50:28AM -0700, Dan Williams wrote:
> Lukas Wunner wrote:
> > Resume is parallelized (see dpm_noirq_resume_devices()), so the latency
> > is bounded by the time to authenticate a single device.
> 
> As far as I understand that can still be on the order of seconds, and
> pathological cases that could be longer. [...]
> How bad is that latency problem in practice?

I'm seeing 150 msec to authenticate a PCI device if the signature can't be
verified (e.g. due to missing trusted root certificate) and 400 msec if
the signature *is* verified.  This varies depending on beefiness of CPU,
algorithm selection, key length and number of provisioned slots.

But I've never seen this take "on the order of seconds", I assume that's
a misunderstanding.

vmlinux size grows by 12.752 bytes with CONFIG_PCI_CMA=y on x86_64.
The feature is disabled by default.

> All of these are mitigated by pushing authentication management to
> drivers.

Device authentication can't be pushed to drivers.  It must be done
*before* driver binding:

Drivers are bound based on identity information in config space
(such as Vendor ID or Device ID).  A malicious device could spoof
identity information in config space to force binding to a specific
(CMA-unaware) driver.

The certificate contains the signed Vendor ID and Device ID of the
device.  By validating the certificate and the signature presented
by the device, its identity can be ascertained by the PCI core
before a driver (the right one) starts accessing it.

> I see no justification for the hard coded aggressive default policy

I think that just preventing driver binding if a device fails
authentication may not be good enough.  If a device is truly
malicious, perhaps we should firewall it off.  I'm worried about
a device laterally attacking other devices through P2PDMA or
sending malformed TLPs upstream to the root complex. 

In patch [11/18], I'm suggesting:

   "Traffic from devices which failed authentication could also be
    filtered through ACS I/O Request Blocking Enable (PCIe r6.2 sec
    7.7.11.3) or through Link Disable (PCIe r6.2 sec 7.5.3.7)."

To firewall off malicious devices, authentication should happen early on.
The system shouldn't be exposed to those devices any longer than necessary.
That's one reason why this patch set performs mandatory authentication
already on enumeration:  So that we're able to catch malicious devices
as early as possible.

Patch [08/18] inserts pci_cma_init() at the end of pci_init_capabilities()
because CMA depends on DOE.  We may want to move DOE and CMA init
further up in the function to authenticate the device even before
enumerating any of its other capabilities.

It's probably too early to decide which actions to take if a device fails
authentication, whether to offer a variety of actions (only prevent driver
binding) or just stick to the harshest one (firewall off the device),
when to perform those actions and which knobs to offer to users for
controlling policy and overriding actions.  We may need more real-world
experience before we can make those decisions and we may need to ask
security folks such as Kees Cook and Jann Horn for their perspective.

This patch set merely exposes to user space whether a device passed
authentication or not.  For that alone, it would indeed be sufficient
to authenticate asynchronously -- or delay authentication until the
sysfs attribute is accessed.

But I wanted to keep the option open to firewall off devices early on.
And placing pci_cma_init() in pci_init_capabilities() felt natural
because it's where all the other device capabilities are enumerated
and initialized.

Thanks,

Lukas