On Wed, Nov 29, 2017 at 11:47:46AM -0700, Alex Williamson wrote: > On Fri, 24 Nov 2017 15:58:09 +1100 > Alexey Kardashevskiy <aik@xxxxxxxxx> wrote: > > > On 15/11/17 03:28, Alex Williamson wrote: > > > On Tue, 14 Nov 2017 13:29:02 +1100 > > > Benjamin Herrenschmidt <benh@xxxxxxxxxxxxxxxxxxx> wrote: > > > > > >> On Tue, 2017-11-14 at 13:23 +1100, David Gibson wrote: > > >>>>>> 1. Allow msix mapping to the userspace (to address non-64k-aligned msix bar) > > >>> > > >>> We have a new plan on this - I'll discuss it over IRC. > > >>> > > >>>>>> 2. Allow write combining in vfio for the userspace (kvm guest is kinda > > >>>>>> special and may simply ignore mapping flags in some configs but PPC radix > > >>>>>> guests still rely on this) > > >>> > > >>> AIUI this isn't for radix, but for DPDK things that we need this. Ben > > >>> talked about it a bit, but I don't know what the outcome was. > > >> > > >> So this is not a powerpc specific issue. Other archs similarily want to > > >> be able to do write combine mappings. > > >> > > >> The way sysfs does it is that for prefetchable BARs, it exposes both > > >> a resourceN and a resourceN_wc file. > > >> > > >> For VFIO it's a bit more tricky, maybe we need to game the offset using > > >> some of it as flags but that's very fishy, or maybe we do some kind of > > >> ioctl that selects the attributes used for that fd instance for > > >> subsequent mappings... > > >> > > >> I'll let Alex chose what he feels most appropriate here. > > > > > > My order of preference would be something like: > > > > > > - mmap flags provide some way for the user to specify a wc mapping > > > within existing regions > > > > There are plenty of flags but none really matches, checked with Paul. > > Is MAP_NONBLOCK off the table? Why? > > > > - some other mechanism of using the existing regions > > > > I can only think of madvise but it does not have appropriate flags either. > > Is it worth the process to define something that is appropriate? Would > either of the above be the obvious architectural/implementation choice > if we could define a flag for it? > > > > - additional regions provided for use exclusively with wc attributes > > > (generalizing PCI BAR wc regions within device specific regions) > > > > > > Adding VFIO_PCI_BAR0_WC_REGION_INDEX for VFIO_PCI_BAR0_REGION_INDEX (and so > > on for other BARs) seems a viable option. > > > > However the comment for VFIO_PCI_xxx_REGION_INDEX says: > > > > VFIO_PCI_NUM_REGIONS = 9 /* Fixed user ABI, region indexes >=9 use */ > > /* device specific cap to define content. */ > > > > > > which limits me in where I can add new indexes, I cannot just add new _WC > > indexes to that enum, can I? I cannot see any existing regions above 9 yet > > though. > > The comment explains how to do this, you'd add a device specific region > with the type identifying it as a PCI MMIO WC region and the sub-type > probably defining the BAR index. > > > > - additional file descriptors provided for wc access > > > > It could be a capability + iocti(VFIO_DEVICE_GET_WC_RESOURCE) which would > > take a BAR index, check if the BAR is prefetchable and if so - return an fd > > which the userspace then could mmap(). This is won't break that ABI with 9 > > regions but it is the least favourable in the list... > > Do the kernel mechanics require it to be a separate file descriptor? A > separate fd is my last choice as well, but the interfaces your were > attempting to use previously seemed to have fd granularity. > > > > This isn't at the top of my priority list to figure out the solution, > > > so whoever implements it will need to provide justification as they > > > move down the list from more to less preferred solutions. Thanks, > > > > I am trying... I was really counting on you guys having this discussed in > > Prague :( > > Should have been there to push your agenda... Thanks, We discussed it briefly, BenH seemed to think there wasn't a big difficulty, IIRC, which is why we didn't spend much time on this (compared to the other issues). So, talk to him. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson
Attachment:
signature.asc
Description: PGP signature