Re: [PATCH 03/11] PCI: pci_stub: Suppress kernel DMA ownership auto-claiming

Jason Gunthorpe <jgg@xxxxxxxxxx> · Mon, 15 Nov 2021 17:19:52 -0400

On Mon, Nov 15, 2021 at 08:58:19PM +0000, Robin Murphy wrote:
> > The above scenarios are already blocked by the kernel with
> > LOCKDOWN_DEV_MEM - yes there are historical ways to violate kernel
> > integrity, and these days they almost all have mitigation. I would
> > consider any kernel integrity violation to be a bug today if
> > LOCKDOWN_INTEGRITY_MAX is enabled.
> > 
> > I don't know why you bring this up?
> 
> Because as soon as anyone brings up "security" I'm not going to blindly
> accept the implicit assumption that VFIO is the only possible way to get one
> device to mess with another. That was just a silly example in the most basic
> terms, and obviously I don't expect well-worn generic sysfs interfaces to be
> a genuine threat, but how confident are you that no other subsystem- or
> driver-level interfaces past present and future can ever be tricked into p2p
> DMA?

Given the definition of LOCKDOWN_INTEGRITY_MAX I will consider any
past/present/future p2p attacks as definitive kernel bugs.

Generally, allowing a device to do arbitary DMA to a userspace
controlled address is a pretty serious bug, and directly attacking the
kernel memory is a much more interesting and serious threat vector.

> > What is the preference then? This is the only working API today,
> > right?
> 
> I believe the intent was that everyone should move to
> iommu_group_get()/iommu_attach_group() - precisely *because*
> iommu_attach_device() can't work sensibly for multi-device groups.

And iommu_attach_group() can't work sensibly for anything except VFIO
today, so hum :)

> > > Indeed I wasn't imagining it changing any ownership, just preventing a group
> > > from being attached to a non-default domain if it contains devices bound to
> > > different incompatible drivers.
> > 
> > So this could solve just the domain/DMA API problem, but it leaves the
> > MMIO peer-to-peer issue unsolved, and it gives no tools to solve it in
> > a layered way.
> > 
> > This seems like half an idea, do you have a solution for the rest?
> 
> Tell me how the p2p DMA issue can manifest if device A is prohibited from
> attaching to VFIO's unmanaged domain while device B still has a driver
> bound, and thus would fail to be assigned to userspace in the first place.
> And conversely if non-VFIO drivers are still prevented from binding to
> device B while device A remains attached to the VFIO domain.

You've assumed that a group is continuously attached to the same
domain during the entire period that userspace has MMIO.

Any domain detatch creates a race where a kernel driver can jump in
and bind, while user space continues to have MMIO control over a
device. That violates the security invariant.

Many new flows, like PASID support, are requiring dynamically changing
the domain bound to a group.

If you want to go in this direction then we also need to have some
kind of compare and swap operation for the domain currently bound to a
group.

>From a security perspective I disliek this idea a lot. Instead of
having nice clear barriers indicating the security domain we have a
very subtle 'so long as a domain is attached' idea, which is going to
get broken.

> > The concept of DMA USER is important here, and it is more than just
> > which domain is attached.
> 
> Tell me how a device would be assigned to userspace while its group is still
> attached to a kernel-managed default domain.
> 
> As soon as anyone calls iommu_attach_group() - or indeed
> iommu_attach_device() if more devices may end up hotplugged into the same
> group later - *that's* when the door opens for potential subversion of any
> kind, without ever having to leave kernel space.

The approach in this series can solve both, attach_device could switch
the device to user mode and it will block future hot plugged kernel
drivers.

> > VFIO also has logic related to the file
> 
> Yes, because unsurprisingly VFIO code is tailored for the specific case of
> VFIO usage rather than anything more general.

VFIO represents this class of users exposing the IOMMU to userspace,
I say it is general of that use class.

> > It isn't muddying the water, it is providing common security code that
> > is easy to undertand.
> > 
> > *Which* userspace FD/process owns the iommu_group is important
> > security information because we can't have process A do DMA attacks on
> > some other process B.
> 
> Tell me how a single group could be attached to two domains representing two
> different process address spaces at once.

Again you are focused on domains and ignoring MMIO.

Requiring the users of the API to continuously assert a non-default
domain is a non-trivial ask.

> In case this concept wasn't as clear as I thought, which seems to be so:
>
>                  | dev->iommu_group->domain | dev->driver
> DMA_OWNER_NONE   |          default         |   unbound
> DMA_OWNER_KERNEL |          default         |    bound
> DMA_OWNER_USER   |        non-default       |    bound

Unfortunately this can't use dev->driver. Reading dev->driver of every
device in the group requires holding the device_lock. really_probe()
already holds the device lock so this becomes a user triggerable ABBA
deadlock when scaled up to N devices.

This is why this series uses only the group mutex and tracks if
drivers are bound inside the group. I view that as unavoidable.

Jason