On Wed, 7 Aug 2024 11:19:10 -0300 Jason Gunthorpe <jgg@xxxxxxxx> wrote: > On Tue, Aug 06, 2024 at 12:43:02PM -0600, Alex Williamson wrote: > > > > So we don't leak this too much into the drivers? Why should all the > > > VFIO drivers have to be changed to alter how their region indexes work > > > just to add a single flag?? > > > I don't know how you're coming to this conclusion. > > Ideally we'd want to support the WC option basically everywhere. > > > > I fear we might need to do this as there may not be room in the pgoff > > > space (at least for 32 bit) to duplicate everything.... > > > We'll root out userspace drivers that hard code region offsets in doing > > this, but otherwise it shouldn't be an issue. > > The issue is running out of pgoff bits on 32 bit. Maybe this isn't an > issue for VFIO, but it was for RDMA. We needed tight optimal on-demand > packing of actual requested mmaps. Allocating gigabytes of address > space for possible mmaps ran out of pgoff bits. :\ If we only implemented WC for 64-bit, would anyone notice? > > How does an "mmap cookie" not duplicate that a device range is > > accessible through multiple offsets of the vfio device file? > > pgoff duplcation is not really an issue, from an API perspective the > driver would call a helper to convert the pgoff into a region index > and mmap flags. It wouldn't matter to any driver how many duplicates > there are. Which is exactly my point whether we call it a region or an mmap cookie. In one case we're trying to give a bare pgoff that effectively aliases to a region with different mapping flags, in the other the pgoff is exposed through a new region offset that does exactly the same thing. > > Well first, we're not talking about a fixed number of additional > > regions, we're talking about defining region identifiers for each BAR > > with a WC mapping attribute, but at worst we'd only populate > > implemented MMIO BARs. But then we've also mentioned that a device > > feature could be used to allow a userspace driver to selectively bring > > these regions into existence. In an case, an mmap cookie also consumes > > address space from the vfio device file, so I'm still failing to see > > how calling them a region vs just an mmap cookie makes a substantive > > difference. > > You'd only allocate the mmap cookie when userspace requests it. I've suggested a mechanism using DEVICE_FEATURE that could do this for regions. > My original suggestion was to send a flag to REGION_INFO to > specifically ask for the different behavior, that (and only that) > would return new mmap cookies. Which can't work because flags is only an output field. > The alternative version of this might be to have a single > 'GET_REGION_MMAP' that gives a new mmap cookie for a singular > specified region index. Userspace would call REGION_INFO to learn the > memory regions and then it could call GET_REGION_MMAP(REQ_WC) and will > get back a single dynamic mmap cookie that links the WC flags. > > No system call, no cookie allocation. Existing apps don't start seeing > more regions from REGION_INFO. Drivers keep region indexes 1:1 with HW > objects. The uAPI has room to add more mmap flags. Please tell me how this is ultimately different from invoking a DEVICE_FEATURE call to request that a new device specific region be created with the desired mappings. In the short term, if we run out of pgoff, the user gets an -ENOSPC. DEVICE_INFO is updated with the new number of regions, existing region indexes are unchanged, the user iterates new indexes with REGION_INFO to get the offset and identifies them using the previously proposed region types. Thanks, Alex