Re: [RFC PATCH kernel] vfio-pci: Allow write combining

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Nov 29, 2017 at 11:47:46AM -0700, Alex Williamson wrote:
> On Fri, 24 Nov 2017 15:58:09 +1100
> Alexey Kardashevskiy <aik@xxxxxxxxx> wrote:
> 
> > On 15/11/17 03:28, Alex Williamson wrote:
> > > On Tue, 14 Nov 2017 13:29:02 +1100
> > > Benjamin Herrenschmidt <benh@xxxxxxxxxxxxxxxxxxx> wrote:
> > >   
> > >> On Tue, 2017-11-14 at 13:23 +1100, David Gibson wrote:  
> > >>>>>> 1. Allow msix mapping to the userspace (to address non-64k-aligned msix bar)    
> > >>>
> > >>> We have a new plan on this - I'll discuss it over IRC.
> > >>>     
> > >>>>>> 2. Allow write combining in vfio for the userspace (kvm guest is kinda
> > >>>>>> special and may simply ignore mapping flags in some configs but PPC radix
> > >>>>>> guests still rely on this)    
> > >>>
> > >>> AIUI this isn't for radix, but for DPDK things that we need this.  Ben
> > >>> talked about it a bit, but I don't know what the outcome was.    
> > >>
> > >> So this is not a powerpc specific issue. Other archs similarily want to
> > >> be able to do write combine mappings.
> > >>
> > >> The way sysfs does it is that for prefetchable BARs, it exposes both
> > >> a resourceN and a resourceN_wc file.
> > >>
> > >> For VFIO it's a bit more tricky, maybe we need to game the offset using
> > >> some of it as flags but that's very fishy, or maybe we do some kind of
> > >> ioctl that selects the attributes used for that fd instance for
> > >> subsequent mappings...
> > >>
> > >> I'll let Alex chose what he feels most appropriate here.  
> > > 
> > > My order of preference would be something like:
> > > 
> > >  - mmap flags provide some way for the user to specify a wc mapping
> > >    within existing regions  
> > 
> > There are plenty of flags but none really matches, checked with Paul.
> 
> Is MAP_NONBLOCK off the table?  Why?
>  
> > >  - some other mechanism of using the existing regions  
> > 
> > I can only think of madvise but it does not have appropriate flags either.
> 
> Is it worth the process to define something that is appropriate?  Would
> either of the above be the obvious architectural/implementation choice
> if we could define a flag for it?
> 
> > >  - additional regions provided for use exclusively with wc attributes
> > >    (generalizing PCI BAR wc regions within device specific regions)  
> > 
> > 
> > Adding VFIO_PCI_BAR0_WC_REGION_INDEX for VFIO_PCI_BAR0_REGION_INDEX (and so
> > on for other BARs) seems a viable option.
> > 
> > However the comment for VFIO_PCI_xxx_REGION_INDEX says:
> > 
> >   VFIO_PCI_NUM_REGIONS = 9 /* Fixed user ABI, region indexes >=9 use */
> >                            /* device specific cap to define content. */
> > 
> > 
> > which limits me in where I can add new indexes, I cannot just add new _WC
> > indexes to that enum, can I? I cannot see any existing regions above 9 yet
> > though.
> 
> The comment explains how to do this, you'd add a device specific region
> with the type identifying it as a PCI MMIO WC region and the sub-type
> probably defining the BAR index.
> 
> > >  - additional file descriptors provided for wc access  
> > 
> > It could be a capability + iocti(VFIO_DEVICE_GET_WC_RESOURCE) which would
> > take a BAR index, check if the BAR is prefetchable and if so - return an fd
> > which the userspace then could mmap(). This is won't break that ABI with 9
> > regions but it is the least favourable in the list...
> 
> Do the kernel mechanics require it to be a separate file descriptor?  A
> separate fd is my last choice as well, but the interfaces your were
> attempting to use previously seemed to have fd granularity.
> 
> > > This isn't at the top of my priority list to figure out the solution,
> > > so whoever implements it will need to provide justification as they
> > > move down the list from more to less preferred solutions.  Thanks,  
> > 
> > I am trying... I was really counting on you guys having this discussed in
> > Prague :(
> 
> Should have been there to push your agenda...  Thanks,

We discussed it briefly, BenH seemed to think there wasn't a big
difficulty, IIRC, which is why we didn't spend much time on this
(compared to the other issues).  So, talk to him.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux