On Thu, 29 Apr 2021 11:29:05 -0500 Shanker Donthineni <sdonthineni@xxxxxxxxxx> wrote: > For pass-through device assignment, the ARM64 KVM hypervisor retrieves > the memory region properties physical address, size, and whether a > region backed with struct page or not from VMA. The prefetchable > attribute of a BAR region isn't visible to KVM to make an optimal > decision for stage2 attributes. > > This patch updates vma->vm_page_prot and maps with write-combine > attribute if the associated BAR is prefetchable. For ARM64 > pgprot_writecombine() is mapped to memory-type MT_NORMAL_NC which > has no side effects on reads and multiple writes can be combined. > > Signed-off-by: Shanker Donthineni <sdonthineni@xxxxxxxxxx> > --- > drivers/vfio/pci/vfio_pci.c | 6 +++++- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c > index 5023e23db3bc..1b734fe1dd51 100644 > --- a/drivers/vfio/pci/vfio_pci.c > +++ b/drivers/vfio/pci/vfio_pci.c > @@ -1703,7 +1703,11 @@ static int vfio_pci_mmap(void *device_data, struct vm_area_struct *vma) > } > > vma->vm_private_data = vdev; > - vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot); > + if (IS_ENABLED(CONFIG_ARM64) && > + (pci_resource_flags(pdev, index) & IORESOURCE_PREFETCH)) > + vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot); > + else > + vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot); If this were a valid thing to do, it should be done for all architectures, not just ARM64. However, a prefetchable range only necessarily allows merged writes, which seems like a subset of the semantics implied by a WC attribute, therefore this doesn't seem universally valid. I'm also a bit confused by your problem statement that indicates that without WC you're seeing unaligned accesses, does this suggest that your driver is actually relying on WC semantics to perform merging to achieve alignment? That seems rather like a driver bug, I'd expect UC vs WC is largely a difference in performance, not a means to enforce proper driver access patterns. Per the PCI spec, the bridge itself can merge writes to prefetchable areas, presumably regardless of this processor attribute, perhaps that's the feature your driver is relying on that might be missing here. Thanks, Alex