On Tue, Feb 25, 2025 at 09:59:13AM +0100, Niklas Schnelle wrote: > On Mon, 2025-02-24 at 14:53 -0600, Bjorn Helgaas wrote: > > On Fri, Feb 14, 2025 at 02:10:51PM +0100, Niklas Schnelle wrote: > > > With the introduction of memory I/O (MIO) instructions enbaled in commit > > > 71ba41c9b1d9 ("s390/pci: provide support for MIO instructions") s390 > > > gained support for direct user-space access to mapped PCI resources. > > > Even without those however user-space can access mapped PCI resources > > > via the s390 specific MMIO syscalls. There is thus nothing fundamentally > > > preventing s390 from supporting VFIO_PCI_MMAP, allowing user-space > > > drivers to access PCI resources without going through the pread() > > > interface. To actually enable VFIO_PCI_MMAP a few issues need fixing > > > however. > > > > > > Firstly the s390 MMIO syscalls do not cause a page fault when > > > follow_pte() fails due to the page not being present. This breaks > > > vfio-pci's mmap() handling which lazily maps on first access. > > > > > > Secondly on s390 there is a virtual PCI device called ISM which has > > > a few oddities. For one it claims to have a 256 TiB PCI BAR (not a typo) > > > which leads to any attempt to mmap() it fail with the following message: > > > > > > vmap allocation for size 281474976714752 failed: use vmalloc=<size> to increase size > > > > > > Even if one tried to map this BAR only partially the mapping would not > > > be usable on systems with MIO support enabled. So just block mapping > > > BARs which don't fit between IOREMAP_START and IOREMAP_END. Solve this > > > by keeping the vfio-pci mmap() blocking behavior around for this > > > specific device via a PCI quirk and new pdev->non_mappable_bars > > > flag. > > > > > > As noted by Alex Williamson With mmap() enabled in vfio-pci it makes > > > sense to also enable HAVE_PCI_MMAP with the same restriction for pdev-> > > > non_mappable_bars. So this is added in patch 3 and I tested this with > > > another small test program. > > > > > > Note: > > > For your convenience the code is also available in the tagged > > > b4/vfio_pci_mmap branch on my git.kernel.org site below: > > > https://git.kernel.org/pub/scm/linux/kernel/git/niks/linux.git/ > > > > > > Thanks, > > > Niklas > > > > > > Link: https://lore.kernel.org/all/c5ba134a1d4f4465b5956027e6a4ea6f6beff969.camel@xxxxxxxxxxxxx/ > > > Signed-off-by: Niklas Schnelle <schnelle@xxxxxxxxxxxxx> > > > --- > > > Changes in v6: > > > - Add a patch to also enable PCI resource mmap() via sysfs and proc > > > exlcluding pdev->non_mappable_bars devices (Alex Williamson) > > > - Added Acks > > > - Link to v5: https://lore.kernel.org/r/20250212-vfio_pci_mmap-v5-0-633ca5e056da@xxxxxxxxxxxxx > > > > I think the series would be more readable if patch 2/3 included all > > the core changes (adding pci_dev.non_mappable_bars, the 3/3 > > pci-sysfs.c and proc.c changes to test it, and I suppose the similar > > vfio_pci_core.c change), and we moved all the s390 content from 2/3 to > > 3/3. > > Maybe we could do the following: > > 1/3: As is > > 2/3: Introduces pdev->non_mappable_bars and the checks in vfio and > proc.c/pci-sysfs.c. To make the flag handle the vfio case with > VFIO_PCI_MMAP gone, a one-line change in s390 will set pdev- > >non_mappable_bars = 1 for all PCI devices. What if you moved the vfio_pci_core.c change to patch 3? Then I think patch 2 would do nothing at all (since there's nothing that sets non_mappable_bars), and all the s390 stuff would be in patch 3? Not sure if that's possible, but I think it's a little confusing to have the s390 changes split across patch 2 and 3. > 3/3: Changes setting pdev->non_mappable_bars = 1 in s390 to only the > ISM device using the quirk handling and adds HAVE_PCI_MMAP. > > Thanks, > Niklas