On Wed, Oct 7, 2020 at 3:23 PM Dan Williams <dan.j.williams@xxxxxxxxx> wrote: > > On Wed, Oct 7, 2020 at 12:49 PM Daniel Vetter <daniel.vetter@xxxxxxxx> wrote: > > > > On Wed, Oct 7, 2020 at 9:33 PM Dan Williams <dan.j.williams@xxxxxxxxx> wrote: > > > > > > On Wed, Oct 7, 2020 at 11:11 AM Daniel Vetter <daniel.vetter@xxxxxxxx> wrote: > > > > > > > > Since 3234ac664a87 ("/dev/mem: Revoke mappings when a driver claims > > > > the region") /dev/kmem zaps ptes when the kernel requests exclusive > > > > acccess to an iomem region. And with CONFIG_IO_STRICT_DEVMEM, this is > > > > the default for all driver uses. > > > > > > > > Except there's two more ways to access pci bars: sysfs and proc mmap > > > > support. Let's plug that hole. > > > > > > Ooh, yes, lets. > > > > > > > For revoke_devmem() to work we need to link our vma into the same > > > > address_space, with consistent vma->vm_pgoff. ->pgoff is already > > > > adjusted, because that's how (io_)remap_pfn_range works, but for the > > > > mapping we need to adjust vma->vm_file->f_mapping. Usually that's done > > > > at ->open time, but that's a bit tricky here with all the entry points > > > > and arch code. So instead create a fake file and adjust vma->vm_file. > > > > > > I don't think you want to share the devmem inode for this, this should > > > be based off the sysfs inode which I believe there is already only one > > > instance per resource. In contrast /dev/mem can have multiple inodes > > > because anyone can just mknod a new character device file, the same > > > problem does not exist for sysfs. > > > > But then I need to find the right one, plus I also need to find the > > right one for the procfs side. That gets messy, and I already have no > > idea how to really test this. Shared address_space is the same trick > > we're using in drm (where we have multiple things all pointing to the > > same underlying resources, through different files), and it gets the > > job done. So that's why I figured the shared address_space is the > > cleaner solution since then unmap_mapping_range takes care of > > iterating over all vma for us. I guess I could reimplement that logic > > with our own locking and everything in revoke_devmem, but feels a bit > > silly. But it would also solve the problem of having mutliple > > different mknod of /dev/kmem with different address_space behind them. > > Also because of how remap_pfn_range works, all these vma do use the > > same pgoff already anyway. > > True, remap_pfn_range() makes sure that ->pgoff is an absolute > physical address offset for all use cases. So you might be able to > just point proc_bus_pci_open() at the shared devmem address space. For > sysfs it's messier. I think you would need to somehow get the inode > from kernfs_fop_open() to adjust its address space, but only if the > bin_file will ultimately be used for PCI memory. To me this seems like a new sysfs_create_bin_file() flavor that registers the file with the common devmem address_space.