Hi Alex,
On 2/24/22 5:53 AM, Alex Williamson wrote:
On Fri, 18 Feb 2022 08:55:20 +0800
Lu Baolu <baolu.lu@xxxxxxxxxxxxxxx> wrote:
The iommu core and driver core have been enhanced to avoid unsafe driver
binding to a live group after iommu_group_set_dma_owner(PRIVATE_USER)
has been called. There's no need to register iommu group notifier. This
removes the iommu group notifer which contains BUG_ON() and WARN().
The commit 5f096b14d421b ("vfio: Whitelist PCI bridges") allowed all
pcieport drivers to be bound with devices while the group is assigned to
user space. This is not always safe. For example, The shpchp_core driver
relies on the PCI MMIO access for the controller functionality. With its
downstream devices assigned to the userspace, the MMIO might be changed
through user initiated P2P accesses without any notification. This might
break the kernel driver integrity and lead to some unpredictable
consequences. As the result, currently we only allow the portdrv driver.
For any bridge driver, in order to avoiding default kernel DMA ownership
claiming, we should consider:
1) Does the bridge driver use DMA? Calling pci_set_master() or
a dma_map_* API is a sure indicate the driver is doing DMA
2) If the bridge driver uses MMIO, is it tolerant to hostile
userspace also touching the same MMIO registers via P2P DMA
attacks?
Conservatively if the driver maps an MMIO region at all, we can say that
it fails the test.
IIUC, there's a chance we're going to break user configurations if
they're assigning devices from a group containing a bridge that uses a
driver other than pcieport. The recommendation to such an affected user
would be that the previously allowed host bridge driver was unsafe for
this use case and to continue to enable assignment of devices within
that group, the driver should be unbound from the bridge device or
replaced with the pci-stub driver. Is that right?
Yes. You are right.
Another possible solution (for long term) is to re-audit the bridge
driver code and set the .device_managed_dma field on the premise that
the driver doesn't violate above potential hazards.
Unfortunately I also think a bisect of such a breakage wouldn't land
here, I think it was actually broken in "vfio: Set DMA ownership for
VFIO" since that's where vfio starts to make use of
iommu_group_claim_dma_owner() which should fail due to
pci_dma_configure() calling iommu_device_use_default_domain() for
any driver not identifying itself as driver_managed_dma.
Yes. Great point. Thank you!
If that's correct, can we leave a breadcrumb in the correct commit log
indicating why this potential breakage is intentional and how the
bridge driver might be reconfigured to continue to allow assignment from
within the group more safely? Thanks,
Sure. I will add below in the commit message of "vfio: Set DMA ownership
for VFIO":
"
This change disallows some unsafe bridge drivers to bind to non-ACS
bridges while devices under them are assigned to user space. This is an
intentional enhancement and possibly breaks some existing
configurations. The recommendation to such an affected user would be
that the previously allowed host bridge driver was unsafe for this use
case and to continue to enable assignment of devices within that group,
the driver should be unbound from the bridge device or replaced with the
pci-stub driver.
For any bridge driver, we consider it unsafe if it satisfies any of the
following conditions:
1) The bridge driver uses DMA. Calling pci_set_master() or calling any
kernel DMA API (dma_map_*() and etc.) is an indicate that the
driver is doing DMA.
2) If the bridge driver uses MMIO, it should be tolerant to hostile
userspace also touching the same MMIO registers via P2P DMA
attacks.
If the bridge driver turns out to be a safe one, it could be used as
before by setting the driver's .driver_managed_dma field, just like what
we have done in the pcieport driver.
"
Best regards,
baolu