On Thu, 1 Aug 2024 13:11:30 -0300 Jason Gunthorpe <jgg@xxxxxxxx> wrote: > On Thu, Aug 01, 2024 at 09:41:23AM -0600, Alex Williamson wrote: > > On Thu, 1 Aug 2024 11:19:14 -0300 > > Jason Gunthorpe <jgg@xxxxxxxx> wrote: > > > > > On Wed, Jul 31, 2024 at 08:53:52AM -0700, Keith Busch wrote: > > > > From: Keith Busch <kbusch@xxxxxxxxxx> > > > > > > > > Write combining can be provide performance improvement for places that > > > > can safely use this capability. > > > > > > > > Previous discussions on the topic suggest a vfio user needs to > > > > explicitly request such a mapping, and it sounds like a new vfio > > > > specific ioctl to request this is one way recommended way to do that. > > > > This patch implements a new ioctl to achieve that so a user can request > > > > write combining on prefetchable memory. A new ioctl seems a bit much for > > > > just this purpose, so the implementation here provides a "flags" field > > > > with only the write combine option defined. The rest of the bits are > > > > reserved for future use. > > > > > > This is a neat hack for sure > > > > > > But how about adding this flag to vfio_region_info ? > > > > > > @@ -275,6 +289,7 @@ struct vfio_region_info { > > > #define VFIO_REGION_INFO_FLAG_WRITE (1 << 1) /* Region supports write */ > > > #define VFIO_REGION_INFO_FLAG_MMAP (1 << 2) /* Region supports mmap */ > > > #define VFIO_REGION_INFO_FLAG_CAPS (1 << 3) /* Info supports caps */ > > > +#define VFIO_REGION_INFO_REQ_WC (1 << 4) /* Request a write combining mapping*/ > > > __u32 index; /* Region index */ > > > __u32 cap_offset; /* Offset within info struct of first cap */ > > > __aligned_u64 size; /* Region size (bytes) */ > > > > > > > > > It specify REQ_WC when calling VFIO_DEVICE_GET_REGION_INFO > > > > > > The kernel will then return an offset value that yields a WC > > > mapping. It doesn't displace the normal non-WC mapping? > > > > > > Arguably we should fixup the kernel to put the mmap cookies into a > > > maple tree so they can be dynamically allocated and more densely > > > packed. > > > > vfio_region_info.flags in not currently tested for input therefore this > > proposal could lead to unexpected behavior for a caller that doesn't > > currently zero this field. It's intended as an output-only field. > > Perhaps a REGION_INFO2 then? > > I still think per-request is better than a global flag I don't understand why we'd need a REGION_INFO2, we already have support for defining new regions. We do this by increasing the num_regions value from the VFIO_DEVICE_GET_INFO ioctl. The user can iterate those additional regions and for each index call VFIO_DEVICE_GET_REGION_INFO. The new regions expose a VFIO_REGION_INFO_CAP_TYPE capability where we define new types as: #define VFIO_REGION_TYPE_PCI_BAR0_WC (4) #define VFIO_REGION_TYPE_PCI_BAR1_WC (5) #define VFIO_REGION_TYPE_PCI_BAR2_WC (6) #define VFIO_REGION_TYPE_PCI_BAR3_WC (7) #define VFIO_REGION_TYPE_PCI_BAR4_WC (8) #define VFIO_REGION_TYPE_PCI_BAR5_WC (9) We'd populate these new regions only for BARs that support prefetch and mmap and we'd define that these BARs may expose only the MMAP flag and not the READ or WRITE flags since those can go through the standard region. Really the only difference from the static vfio-pci defined region indexes is the O(N) search for the user to find the vfio region index to BAR mapping. Thanks, Alex