Re: [RFC] Restrict the untrusted devices, to bind to only a set of "whitelisted" drivers

Rajat Jain <rajatxjain@xxxxxxxxx> · Wed, 3 Jun 2020 02:27:33 +0000

On Mon, Jun 1, 2020 at 10:06 PM Greg Kroah-Hartman
<gregkh@xxxxxxxxxxxxxxxxxxx> wrote:
>
> On Mon, Jun 01, 2020 at 06:25:42PM -0500, Bjorn Helgaas wrote:
> > [+cc Greg, linux-kernel for wider exposure]
>
> Thanks for the cc:, missed this...
>
> >
> > On Tue, May 26, 2020 at 09:30:08AM -0700, Rajat Jain wrote:
> > > On Thu, May 14, 2020 at 7:18 PM Rajat Jain <rajatja@xxxxxxxxxx> wrote:
> > > > On Thu, May 14, 2020 at 12:13 PM Raj, Ashok <ashok.raj@xxxxxxxxx> wrote:
> > > > > On Wed, May 13, 2020 at 02:26:18PM -0700, Rajat Jain wrote:
> > > > > > On Wed, May 13, 2020 at 8:19 AM Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote:
> > > > > > > On Fri, May 01, 2020 at 04:07:10PM -0700, Rajat Jain wrote:
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > Currently, the PCI subsystem marks the PCI devices as "untrusted", if
> > > > > > > > the firmware asks it to:
> > > > > > > >
> > > > > > > > 617654aae50e ("PCI / ACPI: Identify untrusted PCI devices")
> > > > > > > > 9cb30a71acd4 ("PCI: OF: Support "external-facing" property")
> > > > > > > >
> > > > > > > > An "untrusted" device indicates a (likely external facing) device that
> > > > > > > > may be malicious, and can trigger DMA attacks on the system. It may
> > > > > > > > also try to exploit any vulnerabilities exposed by the driver, that
> > > > > > > > may allow it to read/write unintended addresses in the host (e.g. if
> > > > > > > > DMA buffers for the device, share memory pages with other driver data
> > > > > > > > structures or code etc).
> > > > > > > >
> > > > > > > > High Level proposal
> > > > > > > > ===============
> > > > > > > > Currently, the "untrusted" device property is used as a hint to enable
> > > > > > > > IOMMU restrictions (on Intel), disable ATS (on ARM) etc. We'd like to
> > > > > > > > go a step further, and allow the administrator to build a list of
> > > > > > > > whitelisted drivers for these "untrusted" devices. This whitelist of
> > > > > > > > drivers are the ones that he trusts enough to have little or no
> > > > > > > > vulnerabilities. (He may have built this list of whitelisted drivers
> > > > > > > > by a combination of code analysis of drivers, or by extensive testing
> > > > > > > > using PCIe fuzzing etc). We propose that the administrator be allowed
> > > > > > > > to specify this list of whitelisted drivers to the kernel, and the PCI
> > > > > > > > subsystem to impose this behavior:
> > > > > > > >
> > > > > > > > 1) The "untrusted" devices can bind to only "whitelisted drivers".
> > > > > > > > 2) The other devices (i.e. dev->untrusted=0) can bind to any driver.
> > > > > > > >
> > > > > > > > Of course this behavior is to be imposed only if such a whitelist is
> > > > > > > > provided by the administrator.
> > >
> > > I haven't heard much on this proposal after the initial inputs (to
> > > which I responded). Essentially, I agree that IO-MMU and ACS
> > > restrictions need to be put in plcase. But I think we need this
> > > additionally. Does this look acceptable to you? I wanted to start
> > > spinning out the patches, but wanted to see if there are any pending
> > > comments or concerns.
> >
> > I think it makes sense to code this up and see what it would look
> > like.  The bare minimum seems like a driver "bind-to-external-devices"
> > bit that's visible in sysfs plus something in the driver probe path
> > that checks it.  Neither is inherently PCI-specific, but maybe the
> > right place will become obvious when implementing it.

Agree. I'll try to code it up.

My proposal became PCI specific because

* The need for my proposal arrived out of the potentially malicious
*external* devices that can (NOW, with the advent of thunderbolt)
directly DMA into the CPU memory space. PCI (enabled by Thunderbolt 3
and USB4) is the only interface that fits the bill for laptops at
least (There are few more interfaces that allow DMA directly into host
memory such as LPC etc, but they are all internal so far).

* It hinges on the "untrusted" attribute (I see your concerns on this,
and more on this later) which is part of the "struct pci_dev". If we
can move that flag higher up to "struct device", then we can make this
proposal not PCI specific I think.

> >
> > I'm still not 100% sure the device "external/untrusted" bit is the
> > right thing to check.  If you don't trust a driver enough to expose it
> > to an external device, is it reasonable to trust it for internal
> > devices?  It seems like one could attack the driver of even an
> > internal device like a NIC by controlling the data fed to it.
> >
> > The existing use of "external/untrusted" for IOMMU protection is
> > different.  There we're acknowledging that the *device* itself is
> > unknown and we need to protect ourselves from malicious DMA.
> >
> > Here we're concerned about a *driver* that's completely under our
> > control.  If we don't trust the driver, we could (a) fix the problems
> > in the driver, (b) change the driver so it handles external/untrusted
> > devices differently, or (c) not ship the driver at all.

So here is our dilemma. In the laptop world:

1) Today (Pre-Thunderbolt 3 / Pre-USB4), there is a mix of trusted /
untrusted drivers that we (or any OEMs) are shipping on their laptops.
Yes, there is some (calculated) risk that everyone is taking - because
currently PCI bus does not extend outside the laptop *easily*. Yes I
understand systems may have external PCI slots, but that is rather
rare in the laptop world I think. The risks of the existing drivers
are limited to the devices that were built into the system, and since
the drivers, firmware updates, (and supply chain in some cases) are
controlled by us, such internal devices are conceivably more secure
than something random that the user may plug in. If the user opens the
chassis to replace a piece of hardware with something else, all bets
are off. Yes, we're still susceptible to the NIC driver attacks that
you talk about it along with other potential vulnerabilities, but this
is just convey our current baseline level of risk/security.

2) Now, we want to enable technology of tomorrow :-) (Thunderbolt 3 /
USB4) on laptops, which allows to very *easily* extend the internal
PCI bus to the outside devices. Note it doesn't require to open a
laptop, and anyone can plug a device onto a port. This throws the
system open to a lot of DMA attacks now, which it did not have to deal
with earlier. Essentially with the advent of technology to expand PCIe
outside of system chassis, the attacks have become much more easier,
we can no longer control or monitor device hardware or firmware, and
thus the level of risk has clearly increased. So what we are trying to
find here, is a good path to enable these new technologies, that keeps
keeps our baseline level of risk/security unchanged, and to also not
regress in functionality in supporting devices as much as possible.

3) Now you are certainly right that one path could be a binary
decision to ship or not ship a driver, or fix any issues with the
driver, or change the driver to differentiate between external and
internal ports. However, there are multiple factors that pose
practical problems (why regress internal devices that we tightly
control? Why regress systems that don't have such external ports? Need
to front load all effort in vetting the drivers before hand before the
first release. Work with each and individual driver etc).

4) The other path that this proposal aims to take is that by applying
a whitelist of drivers to external ports only, we're going to be able
to *slowly* build this whitelist. We can start with a NULL whitelist.
Which means that existing internal devices continue to work, and
external devices on PCI don't pose a risk. With ACS and IOMMU
restrictions in place, the security/risk baseline remains unchanged.
The existing devices are not regressed. As we vet and whitelist the
drivers, we start supporting more and more USB4 and Thunderbolt3
devices. Until then, those devices when plugged, can continue to work
in the "USB / legacy mode" (I forgot what it is called).

5) To give an example, assume we don't trust the PCI nvme driver and
don't want to whitelist it for external devices given there are so
many off the shelf devices with questionable firmware. But we
certainly need to enable it for internal NVME devices (that we may
have audited the firmware for, and control our supply chain) in order
to boot. With my proposal, until we whitelist it, the internal devices
continue to work, the external NVMEs switch to "USB storage device"
mode and thus go via a USB bridge so they cannot directly DMA into
host memory directly. Keep in mind that whitelisting a driver may be
handled by a separate security team, and may take long time depending
on the driver. The proposal allows us to release laptops with
Thunderbolt3/USB4 support and add peripheral support as we go.

6) Also small nit: consider the other scenario (I think this may not
be as important but still worth a thought). Assume the security team
finds a new vulnerability in a whitelisted driver, and want to take it
out of whitelist. Now, this really isn't possible if there was no
distinction between internal / external devices, and an internal
device uses that driver to boot.

> >
> > I'm also not sure about the administrative details of this.  Certain
> > versions of the driver may be trusted while others are untrusted.
> > That would all have to be managed in userspace, so it's not really our
> > problem, but it sounds like a hassle.  Putting the information in the
> > driver itself would reduce that.

I agree to the points you are making. *

- Who do you think should certify the driver? The driver maintainer?
Do we really think driver authors / maintainers will responsibly
update this field? Also, how often?

- Also, this being a policy decision, and thus best left outside the kernel?

- You are correct that version 1 may be trusted and version 2 may not
be. But I think that provides more reason for this to be maintained by
administrators. They (are supposed to) consciously uprev kernels (or
not), or cherry-pick security patches and can decide whether or not to
uprev a driver, or whether to pick a patch for the entire kernel on
their systems. In other words, they have more control over versions of
what they run on their system. OTOH, if this is maintained within the
upstream kernel by a driver maintainer, he may have no control over
other parts of the kernel (that may have an impact on his driver) and
also his driver needs to move on. I feel this has a lot of room for
mistakes.

>
> What about taking what we have today for USB devices/drivers where we
> have different levels of "authorized"?  There's no reason that could not
> just move to the driver core and be available for all devices/drivers as
> that model has proven to work really well over time.
>
> See the "authorized" sysfs file documentation in
> Documentation/ABI/testing/sysfs-bus-usb for some details.  Lots of
> userspace tools have been built on top of that api to control how and
> when specific USB devices are "allowed" to be used by the kernel and
> userspace.

Thanks for the pointer! I'm still looking at the details yet, but a
quick look (usb_dev_authorized()) seems to suggest that this API is
"device based". The multiple levels of "authorized" seem to take shape
from either how it is wired or from userspace choice. Once authorized,
USB device or interface is authorized to be used by *anyone* (can be
attached to any drivers). Do I understand it right that it does not
differentiate between drivers?

Thanks & Best Regards,

Rajat

>
> thanks,
>
> greg k-h