On Tue, 2025-02-25 at 14:35 -0600, Bjorn Helgaas wrote: > On Tue, Feb 25, 2025 at 09:59:13AM +0100, Niklas Schnelle wrote: > > On Mon, 2025-02-24 at 14:53 -0600, Bjorn Helgaas wrote: > > > On Fri, Feb 14, 2025 at 02:10:51PM +0100, Niklas Schnelle wrote: > > > > With the introduction of memory I/O (MIO) instructions enbaled in commit > > > > 71ba41c9b1d9 ("s390/pci: provide support for MIO instructions") s390 > > > > gained support for direct user-space access to mapped PCI resources. > > > > Even without those however user-space can access mapped PCI resources > > > > via the s390 specific MMIO syscalls. There is thus nothing fundamentally > > > > preventing s390 from supporting VFIO_PCI_MMAP, allowing user-space > > > > drivers to access PCI resources without going through the pread() > > > > interface. To actually enable VFIO_PCI_MMAP a few issues need fixing > > > > however. > > > > > > > > Firstly the s390 MMIO syscalls do not cause a page fault when > > > > follow_pte() fails due to the page not being present. This breaks > > > > vfio-pci's mmap() handling which lazily maps on first access. > > > > > > > > Secondly on s390 there is a virtual PCI device called ISM which has > > > > a few oddities. For one it claims to have a 256 TiB PCI BAR (not a typo) > > > > which leads to any attempt to mmap() it fail with the following message: > > > > > > > > vmap allocation for size 281474976714752 failed: use vmalloc=<size> to increase size > > > > > > > > Even if one tried to map this BAR only partially the mapping would not > > > > be usable on systems with MIO support enabled. So just block mapping > > > > BARs which don't fit between IOREMAP_START and IOREMAP_END. Solve this > > > > by keeping the vfio-pci mmap() blocking behavior around for this > > > > specific device via a PCI quirk and new pdev->non_mappable_bars > > > > flag. > > > > > > > > As noted by Alex Williamson With mmap() enabled in vfio-pci it makes > > > > sense to also enable HAVE_PCI_MMAP with the same restriction for pdev-> > > > > non_mappable_bars. So this is added in patch 3 and I tested this with > > > > another small test program. > > > > > > > > Note: > > > > For your convenience the code is also available in the tagged > > > > b4/vfio_pci_mmap branch on my git.kernel.org site below: > > > > https://git.kernel.org/pub/scm/linux/kernel/git/niks/linux.git/ > > > > > > > > Thanks, > > > > Niklas > > > > > > > > Link: https://lore.kernel.org/all/c5ba134a1d4f4465b5956027e6a4ea6f6beff969.camel@xxxxxxxxxxxxx/ > > > > Signed-off-by: Niklas Schnelle <schnelle@xxxxxxxxxxxxx> > > > > --- > > > > Changes in v6: > > > > - Add a patch to also enable PCI resource mmap() via sysfs and proc > > > > exlcluding pdev->non_mappable_bars devices (Alex Williamson) > > > > - Added Acks > > > > - Link to v5: https://lore.kernel.org/r/20250212-vfio_pci_mmap-v5-0-633ca5e056da@xxxxxxxxxxxxx > > > > > > I think the series would be more readable if patch 2/3 included all > > > the core changes (adding pci_dev.non_mappable_bars, the 3/3 > > > pci-sysfs.c and proc.c changes to test it, and I suppose the similar > > > vfio_pci_core.c change), and we moved all the s390 content from 2/3 to > > > 3/3. > > > > Maybe we could do the following: > > > > 1/3: As is > > > > 2/3: Introduces pdev->non_mappable_bars and the checks in vfio and > > proc.c/pci-sysfs.c. To make the flag handle the vfio case with > > VFIO_PCI_MMAP gone, a one-line change in s390 will set pdev- > > > non_mappable_bars = 1 for all PCI devices. > > What if you moved the vfio_pci_core.c change to patch 3? Then I think > patch 2 would do nothing at all (since there's nothing that sets > non_mappable_bars), and all the s390 stuff would be in patch 3? > > Not sure if that's possible, but I think it's a little confusing to > have the s390 changes split across patch 2 and 3. I'm not really a fan of having a completely unused flag, even in an intermediate commit. I've edited the commits yesterday and with this approach the s390 specific part of 2/3 really is just the below hunk: diff --git a/arch/s390/pci/pci.c b/arch/s390/pci/pci.c index 88f72745fa59..d14b8605a32c 100644 --- a/arch/s390/pci/pci.c +++ b/arch/s390/pci/pci.c @@ -590,6 +590,7 @@ int pcibios_device_add(struct pci_dev *pdev) zpci_zdev_get(zdev); if (pdev->is_physfn) pdev->no_vf_scan = 1; + pdev->non_mappable_bars = 1; zpci_map_resources(pdev); That added line then gets deleted again in 3/3. I think this makes it pretty logical, with patch 2/3 we set it for all PCI devices giving the existing behavior and by pdev->non_mappable_bars replacing the "y if S390" of VFIO_PCI_MMAP, then 3/3 narrows it down to just the ISM device. Thanks, Niklas