On Mon, Nov 15, 2021 at 08:58:19PM +0000, Robin Murphy wrote: > > The above scenarios are already blocked by the kernel with > > LOCKDOWN_DEV_MEM - yes there are historical ways to violate kernel > > integrity, and these days they almost all have mitigation. I would > > consider any kernel integrity violation to be a bug today if > > LOCKDOWN_INTEGRITY_MAX is enabled. > > > > I don't know why you bring this up? > > Because as soon as anyone brings up "security" I'm not going to blindly > accept the implicit assumption that VFIO is the only possible way to get one > device to mess with another. That was just a silly example in the most basic > terms, and obviously I don't expect well-worn generic sysfs interfaces to be > a genuine threat, but how confident are you that no other subsystem- or > driver-level interfaces past present and future can ever be tricked into p2p > DMA? Given the definition of LOCKDOWN_INTEGRITY_MAX I will consider any past/present/future p2p attacks as definitive kernel bugs. Generally, allowing a device to do arbitary DMA to a userspace controlled address is a pretty serious bug, and directly attacking the kernel memory is a much more interesting and serious threat vector. > > What is the preference then? This is the only working API today, > > right? > > I believe the intent was that everyone should move to > iommu_group_get()/iommu_attach_group() - precisely *because* > iommu_attach_device() can't work sensibly for multi-device groups. And iommu_attach_group() can't work sensibly for anything except VFIO today, so hum :) > > > Indeed I wasn't imagining it changing any ownership, just preventing a group > > > from being attached to a non-default domain if it contains devices bound to > > > different incompatible drivers. > > > > So this could solve just the domain/DMA API problem, but it leaves the > > MMIO peer-to-peer issue unsolved, and it gives no tools to solve it in > > a layered way. > > > > This seems like half an idea, do you have a solution for the rest? > > Tell me how the p2p DMA issue can manifest if device A is prohibited from > attaching to VFIO's unmanaged domain while device B still has a driver > bound, and thus would fail to be assigned to userspace in the first place. > And conversely if non-VFIO drivers are still prevented from binding to > device B while device A remains attached to the VFIO domain. You've assumed that a group is continuously attached to the same domain during the entire period that userspace has MMIO. Any domain detatch creates a race where a kernel driver can jump in and bind, while user space continues to have MMIO control over a device. That violates the security invariant. Many new flows, like PASID support, are requiring dynamically changing the domain bound to a group. If you want to go in this direction then we also need to have some kind of compare and swap operation for the domain currently bound to a group. >From a security perspective I disliek this idea a lot. Instead of having nice clear barriers indicating the security domain we have a very subtle 'so long as a domain is attached' idea, which is going to get broken. > > The concept of DMA USER is important here, and it is more than just > > which domain is attached. > > Tell me how a device would be assigned to userspace while its group is still > attached to a kernel-managed default domain. > > As soon as anyone calls iommu_attach_group() - or indeed > iommu_attach_device() if more devices may end up hotplugged into the same > group later - *that's* when the door opens for potential subversion of any > kind, without ever having to leave kernel space. The approach in this series can solve both, attach_device could switch the device to user mode and it will block future hot plugged kernel drivers. > > VFIO also has logic related to the file > > Yes, because unsurprisingly VFIO code is tailored for the specific case of > VFIO usage rather than anything more general. VFIO represents this class of users exposing the IOMMU to userspace, I say it is general of that use class. > > It isn't muddying the water, it is providing common security code that > > is easy to undertand. > > > > *Which* userspace FD/process owns the iommu_group is important > > security information because we can't have process A do DMA attacks on > > some other process B. > > Tell me how a single group could be attached to two domains representing two > different process address spaces at once. Again you are focused on domains and ignoring MMIO. Requiring the users of the API to continuously assert a non-default domain is a non-trivial ask. > In case this concept wasn't as clear as I thought, which seems to be so: > > | dev->iommu_group->domain | dev->driver > DMA_OWNER_NONE | default | unbound > DMA_OWNER_KERNEL | default | bound > DMA_OWNER_USER | non-default | bound Unfortunately this can't use dev->driver. Reading dev->driver of every device in the group requires holding the device_lock. really_probe() already holds the device lock so this becomes a user triggerable ABBA deadlock when scaled up to N devices. This is why this series uses only the group mutex and tracks if drivers are bound inside the group. I view that as unavoidable. Jason