Re: [PATCH v2 08/18] PCI/CMA: Authenticate devices on enumeration

Dan Williams <dan.j.williams@xxxxxxxxx> · Mon, 15 Jul 2024 13:08:30 -0700

Lukas Wunner wrote:
> [cc += Kees Cook, Jann Horn; start of thread:
> https://lore.kernel.org/all/6d4361f13a942efc4b4d33d22e56b564c4362328.1719771133.git.lukas@xxxxxxxxx/
> ]
> 
> On Thu, Jul 11, 2024 at 10:50:28AM -0700, Dan Williams wrote:
> > Lukas Wunner wrote:
> > > Resume is parallelized (see dpm_noirq_resume_devices()), so the latency
> > > is bounded by the time to authenticate a single device.
> > 
> > As far as I understand that can still be on the order of seconds, and
> > pathological cases that could be longer. [...]
> > How bad is that latency problem in practice?
> 
> I'm seeing 150 msec to authenticate a PCI device if the signature can't be
> verified (e.g. due to missing trusted root certificate) and 400 msec if
> the signature *is* verified.  This varies depending on beefiness of CPU,
> algorithm selection, key length and number of provisioned slots.
> 
> But I've never seen this take "on the order of seconds", I assume that's
> a misunderstanding.

That worry came from an offlist discussion around handling AEAD limits
for IDE. If IDE is going to go into an error state when the AEAD limit
is reached then software needs to prepared for the worst case time to
re-establish that session and that worst case DOE transfers take
1-second.

That said, a device that takes one-second per DOE message is likely
broken for other reasons, so lets hope that authentication latency does
not become a problem in practice.

[..]
> > All of these are mitigated by pushing authentication management to
> > drivers.
> 
> Device authentication can't be pushed to drivers.  It must be done
> *before* driver binding:

Allowing for it to be possible before driver binding is a good idea,
mandating it is the issue. Mechanism vs policy.

> Drivers are bound based on identity information in config space
> (such as Vendor ID or Device ID).  A malicious device could spoof
> identity information in config space to force binding to a specific
> (CMA-unaware) driver.

Yes, and mitigating that depends on the threat model. For example,
unauthenticated devices talking to public memory is outside the TDISP
threat model. It is private memory that needs end-to-end protection.

> The certificate contains the signed Vendor ID and Device ID of the
> device.  By validating the certificate and the signature presented
> by the device, its identity can be ascertained by the PCI core
> before a driver (the right one) starts accessing it.
>
> > I see no justification for the hard coded aggressive default policy
> 
> I think that just preventing driver binding if a device fails
> authentication may not be good enough.  If a device is truly
> malicious, perhaps we should firewall it off.  I'm worried about
> a device laterally attacking other devices through P2PDMA or
> sending malformed TLPs upstream to the root complex. 
> 
> In patch [11/18], I'm suggesting:
> 
>    "Traffic from devices which failed authentication could also be
>     filtered through ACS I/O Request Blocking Enable (PCIe r6.2 sec
>     7.7.11.3) or through Link Disable (PCIe r6.2 sec 7.5.3.7)."

Again that is a policy option dependent on the threat model.

> To firewall off malicious devices, authentication should happen early on.
> The system shouldn't be exposed to those devices any longer than necessary.
> That's one reason why this patch set performs mandatory authentication
> already on enumeration:  So that we're able to catch malicious devices
> as early as possible.

We keep talking past each other.

I am not disagreeing with the possibility of deploying the strictest
imaginable policy around CMA. Instead, I am looking for CMA to consider
optionality in policy given the TDISP threat model, and the known
"secure CSP device inventory" use cases. Neither of those are mandating
that CMA classify all non-authenticated devices as malicious.

Going further, there is a reason that CMA is only a building block of
TDISP. If the threat model is "malicious device implementation" then the
threat mitigation needs to consider spoofed MMIO. That's where IDE and
private MMIO come into play. Sure, CMA is a hurdle to make it more
difficult to carry out a malicious device implementation attack, but do
not oversell the protection it affords relative to all the other steps
needed to protect confidential memory.

[..]
> This patch set merely exposes to user space whether a device passed
> authentication or not.  For that alone, it would indeed be sufficient
> to authenticate asynchronously -- or delay authentication until the
> sysfs attribute is accessed.
> 
> But I wanted to keep the option open to firewall off devices early on.
> And placing pci_cma_init() in pci_init_capabilities() felt natural
> because it's where all the other device capabilities are enumerated
> and initialized.

Yes, lets build that as an *option*, and step back from CONFIG_PCI_CMA
implying an "unauthenticated == malicious" policy. Given the TDISP
threat model allows for unauthenticated devices to freely access public
memory, my contention is that Linux policy should start with how to
protect private (confidential) memory and then grow to cross-device
attack and bare metal device policy.

In other words, "hardware enforced confidential memory" is the new
concept that makes Linux reconsider its stance towards devices. If there
is no confidential memory to protect, does the mere presence of CMA mean
that Linux upends its device-driver model?