On Wed, 2025-02-12 at 13:28 -0700, Alex Williamson wrote: > On Wed, 12 Feb 2025 16:28:32 +0100 > Niklas Schnelle <schnelle@xxxxxxxxxxxxx> wrote: > > > On s390 there is a virtual PCI device called ISM which has a few > > peculiarities. For one, it presents a 256 TiB PCI BAR whose size leads > > to any attempt to ioremap() the whole BAR failing. This is problematic > > since mapping the whole BAR is the default behavior of for example > > vfio-pci in combination with QEMU and VFIO_PCI_MMAP enabled. > > > > Even if one tried to map this BAR only partially, the mapping would not > > be usable without extra precautions on systems with MIO support enabled. > > This is because of another oddity, in that this virtual PCI device does > > not support the newer memory I/O (MIO) PCI instructions and legacy PCI > > instructions are not accessible through writeq()/readq() when MIO is in > > use. > > > > In short the ISM device's BAR is not accessible through memory mappings. > > Indicate this by introducing a new non_mappable_bars flag for the ISM > > device and set it using a PCI quirk. Use this flag instead of the > > VFIO_PCI_MMAP Kconfig option to block mapping with vfio-pci. This was > > the only use of the Kconfig option so remove it. Note that there are no > > PCI resource sysfs files on s390x already as HAVE_PCI_MMAP is currently > > not set. If this were to be set in the future pdev->non_mappable_bars > > can be used to prevent unusable resource files for ISM from being > > created. > > I think we should also look at it from the opposite side, not just > s390x maybe adding HAVE_PCI_MMAP in the future, but the fact that we're > currently adding a generic PCI device flag which isn't honored by the > one mechanism that PCI core provides to mmap MMIO BARs to userspace. > It seems easier to implement it in pci_mmap_resource() now rather than > someone later discovering there's no enforcement outside of the very > narrow s390x use case. Thanks, > > Alex That is a very good point! I did try enabling HAVE_PCI_MMAP for s390 a while back and I believe that ran into trouble with ISM devices too. So I just did a quick test of enabling HAVE_PCI_MMAP with ARCH_GENERIC_PCI_MMAP_RESOURCE for s390. Then added a check for pdev->non_mappable_bars to pci_create_resource_files() and proc_bus_pci_mmap(). I pondered adding it to pci_mmap_resource() too but felt like not showing the resource at all, like we do now with !HAVE_PCI_MMAP is cleaner. Using a little test program that just mmap()s BAR 0 of an NVMe and reads the NVMe version at offset 8 using our PCI MIO load instruction works. Also, as expected I don't get resourceX files for ISM devices with the added check. I still have to test the /proc/bus/pci based mmap() but would expect that to work too. So I'd be open to adding another patch which adds HAVE_PCI_MMAP for s390, if we see too much risk with that, we could alternatively add just the pdev->non_mappable_bars but then they would be untested, still better than hoping that someone remembers to add that in the future. Thanks, Niklas