On Thu, 1 Aug 2024 14:53:39 -0300 Jason Gunthorpe <jgg@xxxxxxxx> wrote: > On Thu, Aug 01, 2024 at 11:33:44AM -0600, Alex Williamson wrote: > > On Thu, 1 Aug 2024 14:13:55 -0300 > > Jason Gunthorpe <jgg@xxxxxxxx> wrote: > > > > > On Thu, Aug 01, 2024 at 10:52:18AM -0600, Alex Williamson wrote: > > > > > > vfio_region_info.flags in not currently tested for input therefore this > > > > > > proposal could lead to unexpected behavior for a caller that doesn't > > > > > > currently zero this field. It's intended as an output-only field. > > > > > > > > > > Perhaps a REGION_INFO2 then? > > > > > > > > > > I still think per-request is better than a global flag > > > > > > > > I don't understand why we'd need a REGION_INFO2, we already have > > > > support for defining new regions. > > > > > > It is not a new region, it is a modified mmap behavior for an existing > > > region. > > > > If we're returning a different offset into the vfio device file from > > which to get a WC mapping, what's the difference? > > I think it is a pretty big difference.. The offset is just a "mmap > cookie", it doesn't have to be 1:1 with the idea of a region. > > > A vfio "region" is > > describing a region or range of the vfio device file descriptor. > > I'm thinking a region is describing an area of memory that is > available in the VFIO device. The offset output is just a "mmap > cookie" to tell userspace how to mmap it. Having N mmap cookies for 1 > region is OK. Is an "mmap cookie" an offset into the vfio device file where mmap'ing that offset results in a WC mapping to a specific device resource? Isn't that just a region that doesn't have an index or supporting infrastructure? > > > > We'd populate these new regions only for BARs that support prefetch and > > > > mmap > > > > > > That's not the point, prefetch has nothing to do with write combining. > > > > I was following the original proposal in this thread that added a > > prefetch flag to REGION_INFO and allowed enabling WC only for > > IORESOURCE_PREFETCH. > > Oh, I didn't notice that, it shouldn't do that. Returning the > VFIO_REGION_FLAG_WRITE_COMBINE makes sense, but it shouldn't effect > what the kernel allows. > > > > Doubling all the region indexes just for WC does not seem like a good > > > idea to me... > > > > Is the difference you see that in the REQ_WC proposal the user is > > effectively asking vfio to pop a WC region into existence vs here > > they're pre-populated? > > ?? This didn't create more regions AFAICT. It created a new global > > + bool bar_write_combine[PCI_STD_NUM_BARS]; > > Which controls what NC/WC the mmap creates when called: > > + if (vdev->bar_write_combine[index]) > + vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot); > + else > + vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot); > > You get the same output from REGION_INFO, same number of regions. It was your proposal that introduced REQ_WC, this is Keith's original proposal. I'm equating a REQ_WC request inventing an "mmap cookie" as effectively the same as bringing a lightweight region into existence because it defines a section of the vfio device file to have specific mmap semantics. > It was the other proposal from long ago that created more regions. > > This is what I like and would prefer to stick with. REGION_INFO > doesn't really change, we don't have two regions refering to the same > physical memory, and we find some way to request NC/WC of a region at > mmap time. "At mmap time" means that something in the vma needs to describe to us to use the WC semantics, where I think you're proposing that the "mmap cookie" provides a specific vm_pgoff which we already use to determine the region index. So whether or not we want to call this a region, it's effectively in the same address space as regions. Therefore "mmap cookie" ~= "region offset". > A global is a neat trick, but it would be cleaner to request > properties of the mmap when the "mmap cookie" is obtained. > > > At the limit they're the same. We could use a > > DEVICE_FEATURE to ask vfio to selectively populate WC regions after > > which the user could re-enumerate additional regions, or in fact to > > switch on WC for a given region if we want to go that route. Thanks, > > This is still adding more regions and reporting more stuff from > REGION_INFO, that is what I would like to avoid. Why? This reminds me of hidden registers outside of capability chains in PCI config space. Thanks, Alex