On Fri, 24 Nov 2017 15:58:09 +1100 Alexey Kardashevskiy <aik@xxxxxxxxx> wrote: > On 15/11/17 03:28, Alex Williamson wrote: > > On Tue, 14 Nov 2017 13:29:02 +1100 > > Benjamin Herrenschmidt <benh@xxxxxxxxxxxxxxxxxxx> wrote: > > > >> On Tue, 2017-11-14 at 13:23 +1100, David Gibson wrote: > >>>>>> 1. Allow msix mapping to the userspace (to address non-64k-aligned msix bar) > >>> > >>> We have a new plan on this - I'll discuss it over IRC. > >>> > >>>>>> 2. Allow write combining in vfio for the userspace (kvm guest is kinda > >>>>>> special and may simply ignore mapping flags in some configs but PPC radix > >>>>>> guests still rely on this) > >>> > >>> AIUI this isn't for radix, but for DPDK things that we need this. Ben > >>> talked about it a bit, but I don't know what the outcome was. > >> > >> So this is not a powerpc specific issue. Other archs similarily want to > >> be able to do write combine mappings. > >> > >> The way sysfs does it is that for prefetchable BARs, it exposes both > >> a resourceN and a resourceN_wc file. > >> > >> For VFIO it's a bit more tricky, maybe we need to game the offset using > >> some of it as flags but that's very fishy, or maybe we do some kind of > >> ioctl that selects the attributes used for that fd instance for > >> subsequent mappings... > >> > >> I'll let Alex chose what he feels most appropriate here. > > > > My order of preference would be something like: > > > > - mmap flags provide some way for the user to specify a wc mapping > > within existing regions > > There are plenty of flags but none really matches, checked with Paul. Is MAP_NONBLOCK off the table? Why? > > - some other mechanism of using the existing regions > > I can only think of madvise but it does not have appropriate flags either. Is it worth the process to define something that is appropriate? Would either of the above be the obvious architectural/implementation choice if we could define a flag for it? > > - additional regions provided for use exclusively with wc attributes > > (generalizing PCI BAR wc regions within device specific regions) > > > Adding VFIO_PCI_BAR0_WC_REGION_INDEX for VFIO_PCI_BAR0_REGION_INDEX (and so > on for other BARs) seems a viable option. > > However the comment for VFIO_PCI_xxx_REGION_INDEX says: > > VFIO_PCI_NUM_REGIONS = 9 /* Fixed user ABI, region indexes >=9 use */ > /* device specific cap to define content. */ > > > which limits me in where I can add new indexes, I cannot just add new _WC > indexes to that enum, can I? I cannot see any existing regions above 9 yet > though. The comment explains how to do this, you'd add a device specific region with the type identifying it as a PCI MMIO WC region and the sub-type probably defining the BAR index. > > - additional file descriptors provided for wc access > > It could be a capability + iocti(VFIO_DEVICE_GET_WC_RESOURCE) which would > take a BAR index, check if the BAR is prefetchable and if so - return an fd > which the userspace then could mmap(). This is won't break that ABI with 9 > regions but it is the least favourable in the list... Do the kernel mechanics require it to be a separate file descriptor? A separate fd is my last choice as well, but the interfaces your were attempting to use previously seemed to have fd granularity. > > This isn't at the top of my priority list to figure out the solution, > > so whoever implements it will need to provide justification as they > > move down the list from more to less preferred solutions. Thanks, > > I am trying... I was really counting on you guys having this discussed in > Prague :( Should have been there to push your agenda... Thanks, Alex