Re: [RFC PATCH] vfio/pci: map prefetchble bars as writecombine

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 19 Jul 2018 21:49:48 +0530
Srinath Mannam <srinath.mannam@xxxxxxxxxxxx> wrote:

> HI Alex,
> 
> On Thu, Jul 19, 2018 at 8:42 PM, Alex Williamson
> <alex.williamson@xxxxxxxxxx> wrote:
> > On Thu, 19 Jul 2018 20:17:11 +0530
> > Srinath Mannam <srinath.mannam@xxxxxxxxxxxx> wrote:
> >  
> >> HI Alex,
> >>
> >> On Thu, Jul 19, 2018 at 2:55 AM, Alex Williamson
> >> <alex.williamson@xxxxxxxxxx> wrote:  
> >> > On Thu, 19 Jul 2018 00:05:18 +0530
> >> > Srinath Mannam <srinath.mannam@xxxxxxxxxxxx> wrote:
> >> >  
> >> >> Hi Alex,
> >> >>
> >> >> On Tue, Jul 17, 2018 at 8:52 PM, Alex Williamson
> >> >> <alex.williamson@xxxxxxxxxx> wrote:  
> >> >> > On Fri, 13 Jul 2018 10:26:17 +0530
> >> >> > Srinath Mannam <srinath.mannam@xxxxxxxxxxxx> wrote:
> >> >> >  
> >> >> >> By default all BARs map with VMA access permissions
> >> >> >> as pgprot_noncached.
> >> >> >>
> >> >> >> In ARM64 pgprot_noncached is MT_DEVICE_nGnRnE which
> >> >> >> is strongly ordered and allows aligned access.
> >> >> >> This type of mapping works for NON-PREFETCHABLE bars
> >> >> >> containing EP controller registers.
> >> >> >> But it restricts PREFETCHABLE bars from doing
> >> >> >> unaligned access.
> >> >> >>
> >> >> >> In CMB NVMe drives PREFETCHABLE bars are required to
> >> >> >> map as MT_NORMAL_NC to do unaligned access.
> >> >> >>
> >> >> >> Signed-off-by: Srinath Mannam <srinath.mannam@xxxxxxxxxxxx>
> >> >> >> Reviewed-by: Ray Jui <ray.jui@xxxxxxxxxxxx>
> >> >> >> Reviewed-by: Vikram Prakash <vikram.prakash@xxxxxxxxxxxx>
> >> >> >> ---  
> >> >> >
> >> >> > This has been discussed before:
> >> >> >
> >> >> > https://www.spinics.net/lists/kvm/msg156548.html  
> >> >> Thank you for inputs.. I have gone through the long list of mail chain
> >> >> discussion.  
> >> >> >
> >> >> > CC'ing the usual suspects from the previous thread.  I'm not convinced
> >> >> > that the patch here has considered anything other than the ARM64
> >> >> > implications and it's not clear that it considers compatibility with
> >> >> > existing users or devices at all.  Can we guarantee for all devices and
> >> >> > use cases that WC is semantically equivalent and preferable to UC?  If
> >> >> > not then we need to device an extension to the interface that allows
> >> >> > the user to specify WC.  Thanks,
> >> >> >  
> >> >> To implement with user specified WC flags, many changes need to be done.
> >> >> Suppose In DPDK, prefetcable BARs map using WC flag, then also same
> >> >> question comes
> >> >> that WC may be different for different CPUs.
> >> >> As per functionality, both WC and PREFETCHABLE are same, like merging writes and
> >> >> typically WC is uncached.
> >> >> So, based on prefetchable BARs behavior and usage we need to map bar memory.
> >> >> Is it right to map prefetchable BARs as strongly ordered, aligned
> >> >> access and uncached?  
> >> >
> >> > Is it possible to answer that question generically?  Whether to map a
> >> > BAR as UC or WC is generally a question for the driver.  Does the
> >> > device handle unaligned accesses?  Does the device need strong memory
> >> > ordering?  If this is a driver level question then the driver that
> >> > needs to make that decision is the userspace driver.  VFIO is just a
> >> > pass-through here and since we don't offer the user a choice of
> >> > mappings, we take the safer and more conservative mapping, ie. UC.
> >> >  
> >> Yes, you are right, driver should make the decision based on its requirement.
> >> In my case, user space driver is part of SPDK, so SPDK should request DPDK
> >> and DPDK should request VFIO to map BAR for its choice of mapping.
> >> So to implement this we need code changes in VFIO, DPDK and SPDK.
> >>  
> >> > You're suggesting that there are many changes to be done if we modify
> >> > the vfio interface to expose WC under the user's control rather than
> >> > simply transparently impose WC for all mappings, but is that really the
> >> > case?  Most devices on most platforms seem to work fine now.  Perhaps WC
> >> > is a performance optimization, but this is the first instance I've seen
> >> > of it as a functional issue.  Does that suggest that the imposed
> >> > alignment on your platform is perhaps unique and the relaxed alignment
> >> > should be implemented at the architecture specific memory flags for UC
> >> > mappings?  For instance, does x86 require this change for the same
> >> > device?  The chance for regressions of other devices on other platforms
> >> > seems rather high as proposed. Thanks,  
> >> This issue is not specific to platform or device. this is the requirement of
> >> CMB enabled NVMe cards.
> >> NVMe kernel driver already has support to map CMB bar as WC using
> >> ioremap_wc function.
> >> File: drivers/nvme/host/pci.c
> >> Function: nvme_map_cmb
> >> code: dev->cmb = ioremap_wc(pci_resource_start(pdev, bar) + offset, size);
> >> It means ioremap_wc is working with all platforms and WC map of
> >> perfetchable BARs does not harm.
> >> Same is required in SPDK NVMe driver also, without WC map it may work
> >> in x86 platform, but it does not work in ARM platforms.  
> >
> > Doesn't this contradict your assertion that it's not specific to
> > platform or device?  The device requires support for unaligned
> > accesses.  The platform chooses to restrict unaligned accesses for
> > non-WC mappings while other platforms do not.  The native driver can
> > still clearly have performance considerations for choosing to use WC
> > mappings, but it's still not clear to me that the functionality issue
> > isn't self inflicted by the platform definition of UC vs WC.  Thanks,
> >  
> Device allows both aligned and unaligned access.. so software have
> flexibility can do unaligned access.
> As per ARM64 platform, with UC map, memory access should be un-cached,
> aligned access and strongly order mapping.
> with WC map, memory access can be aligned/unaligned and un-cached. I
> thought this is the property of platform not issue.
> To allow software to do unaligned access of device memory, we need to
> use WC map of ARM64 platform case.
> In ARM platforms UC mapping is used to map controller registers which
> are 4 byte aligned exposed by non-prefetchable bars.
> Also prefetchable BARs allows write merging so I thought using WC map
> fulfills both write merging (add performance) and unaligned
> access.
> Can I modify the code as below to enable only for ARM platforms.
>         vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
> +#if defined(CONFIG_ARM) || defined(CONFIG_ARM64)
> +       if (pci_resource_flags(pdev, index) & IORESOURCE_PREFETCH)
> +               vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot);
> +#endif
>         vma->vm_pgoff = (pci_resource_start(pdev, index) >> PAGE_SHIFT) + pgoff;

While the risk of regression is smaller by restricting this to ARM, I
don't think it's the right solution.  What happens when a device
requires strict ordering?  ARM now behaves differently than any other
architecture, that's not acceptable.  Thanks,

Alex



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux