From: Robin Murphy <robin.murphy@xxxxxxx> Sent: Friday, March 18, 2022 3:58 AM > > On 2022-03-18 05:12, Michael Kelley (LINUX) wrote: > > From: Robin Murphy <robin.murphy@xxxxxxx> Sent: Thursday, March 17, 2022 > 10:15 AM > >> > >> On 2022-03-17 16:25, Michael Kelley via iommu wrote: > >>> PCI pass-thru devices in a Hyper-V VM are represented as a VMBus > >>> device and as a PCI device. The coherence of the VMbus device is > >>> set based on the VMbus node in ACPI, but the PCI device has no > >>> ACPI node and defaults to not hardware coherent. This results > >>> in extra software coherence management overhead on ARM64 when > >>> devices are hardware coherent. > >>> > >>> Fix this by propagating the coherence of the VMbus device to the > >>> PCI device. There's no effect on x86/x64 where devices are > >>> always hardware coherent. > >>> > >>> Signed-off-by: Michael Kelley <mikelley@xxxxxxxxxxxxx> > >>> --- > >>> drivers/pci/controller/pci-hyperv.c | 17 +++++++++++++---- > >>> 1 file changed, 13 insertions(+), 4 deletions(-) > >>> > >>> diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c > >>> index ae0bc2f..14276f5 100644 > >>> --- a/drivers/pci/controller/pci-hyperv.c > >>> +++ b/drivers/pci/controller/pci-hyperv.c > >>> @@ -49,6 +49,7 @@ > >>> #include <linux/refcount.h> > >>> #include <linux/irqdomain.h> > >>> #include <linux/acpi.h> > >>> +#include <linux/dma-map-ops.h> > >>> #include <asm/mshyperv.h> > >>> > >>> /* > >>> @@ -2142,9 +2143,9 @@ static void hv_pci_remove_slots(struct hv_pcibus_device > >> *hbus) > >>> } > >>> > >>> /* > >>> - * Set NUMA node for the devices on the bus > >>> + * Set NUMA node and DMA coherence for the devices on the bus > >>> */ > >>> -static void hv_pci_assign_numa_node(struct hv_pcibus_device *hbus) > >>> +static void hv_pci_assign_properties(struct hv_pcibus_device *hbus) > >>> { > >>> struct pci_dev *dev; > >>> struct pci_bus *bus = hbus->bridge->bus; > >>> @@ -2167,6 +2168,14 @@ static void hv_pci_assign_numa_node(struct > >> hv_pcibus_device *hbus) > >>> numa_map_to_online_node( > >>> hv_dev->desc.virtual_numa_node)); > >>> > >>> + /* > >>> + * On ARM64, propagate the DMA coherence from the VMbus device > >>> + * to the corresponding PCI device. On x86/x64, these calls > >>> + * have no effect because DMA is always hardware coherent. > >>> + */ > >>> + dev_set_dma_coherent(&dev->dev, > >>> + dev_is_dma_coherent(&hbus->hdev->device)); > >> > >> Eww... if you really have to do this, I'd prefer to see a proper > >> hv_dma_configure() helper implemented and wired up to > >> pci_dma_configure(). Although since it's a generic property I guess at > >> worst pci_dma_configure could perhaps propagate coherency from the host > >> bridge to its children by itself in the absence of any other firmware > >> info. And it's built-in so could use arch_setup_dma_ops() like everyone > >> else. > >> > > > > I'm not seeing an existing mechanism to provide a "helper" or override > > of pci_dma_configure(). Could you elaborate? Or is this something > > that needs to be created? > > I mean something like the diff below (other #includes omitted for > clarity). Essentially if VMBus has its own way of describing parts of > the system, then for those parts it's nice if it could fit into the same > abstractions we use for firmware-based system description. OK, got it. Adding the VMbus special case into pci_dma_configure() would work. But after investigating further, let me throw out two other possible solutions as well. 1) It's easy to make the top-level VMbus node in the DSDT be the ACPI companion for the pci_host bridge. The function pcibios_root_bridge_prepare() will do this if the pci-hyperv.c driver sets sysdata->parent to that top-level VMbus node. Then pci_dma_configure()works as is. Also, commit 7d40c0f70 that special cases pcibios_root_bridge_prepare() for Hyper-V becomes unnecessary. I've coded this approach and it seems to work, but I don't know all the implications of making that VMbus node be the ACPI companion, potentially for multiple pci_host bridges. 2) Since there's no vIOMMU and we don't know how it will be described if there should be one in the future, it's a bit premature to try to set things up "correctly" now to handle it. A simple approach is to set dma_default_coherent to true if the top-level VMbus node in the DSDT indicates coherent. No other changes are needed. If there's a vIOMMU at some later time, then we add the use of arch_setup_dma_ops() for each device. Any thoughts on these two approaches vs. adding the VMbus special case into pci_dma_configure()? My preference would be to avoid adding that special case if one of the other two approaches is reasonable. Michael > > Cheers, > Robin. > > ----->8----- > diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c > index 588588cfda48..7d92ccad1569 100644 > --- a/drivers/pci/pci-driver.c > +++ b/drivers/pci/pci-driver.c > @@ -20,6 +20,7 @@ > #include <linux/of_device.h> > #include <linux/acpi.h> > #include <linux/dma-map-ops.h> > +#include <linux/hyperv.h> > #include "pci.h" > #include "pcie/portdrv.h" > > @@ -1602,6 +1603,8 @@ static int pci_dma_configure(struct device *dev) > struct acpi_device *adev = to_acpi_device_node(bridge->fwnode); > > ret = acpi_dma_configure(dev, acpi_get_dma_attr(adev)); > + } else if (is_vmbus_dev(bridge)) { > + ret = hv_dma_configure(dev, device_get_dma_attr(bridge)); > } > > pci_put_host_bridge_device(bridge); > diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h > index f565a8938836..d1d4dd3d5a3a 100644 > --- a/include/linux/hyperv.h > +++ b/include/linux/hyperv.h > @@ -1764,4 +1764,19 @@ static inline unsigned long virt_to_hvpfn(void *addr) > #define HVPFN_DOWN(x) ((x) >> HV_HYP_PAGE_SHIFT) > #define page_to_hvpfn(page) (page_to_pfn(page) * > NR_HV_HYP_PAGES_IN_PAGE) > > +static inline bool is_vmbus_dev(struct device *dev) > +{ > + /* > + * dev->bus == &hv_bus would break when the caller is built-in > + * and CONFIG_HYPERV=m, so look for it by name instead. > + */ > + return !strcmp(dev->bus->name, "vmbus"); > +} > + > +static inline int hv_dma_configure(struct device *dev, enum > dev_dma_attr attr) > +{ > + arch_setup_dma_ops(dev, 0, 0, NULL, attr == DEV_DMA_COHERENT); > + return 0; > +} > + > #endif /* _HYPERV_H */