On Wed, Jun 29, 2022 at 10:00:09AM -0600, Logan Gunthorpe wrote: > > > > On 2022-06-29 00:48, Christoph Hellwig wrote: > > On Wed, Jun 15, 2022 at 10:12:32AM -0600, Logan Gunthorpe wrote: > >> A pseudo mount is used to allocate an inode for each PCI device. The > >> inode's address_space is used in the file doing the mmap so that all > >> VMAs are collected and can be unmapped if the PCI device is unbound. > >> After unmapping, the VMAs are iterated through and their pages are > >> put so the device can continue to be unbound. An active flag is used > >> to signal to VMAs not to allocate any further P2P memory once the > >> removal process starts. The flag is synchronized with concurrent > >> access with an RCU lock. > > > > Can't we come up with a way of doing this without all the pseudo-fs > > garbagage? I really hate all the overhead for that in the next > > nvme patch as well. > > I assume you still want to be able to unmap the VMAs on unbind and not > just hang? > > I'll see if I can come up with something to do the a similar thing using > vm_private data or some such. I've tried in the past, this is not a good idea. There is no way to handle failures when a VMA is dup'd and if you rely on private_data you almost certainly have to alloc here. Then there is the issue of making the locking work on invalidation which is crazy ugly. > I was not a fan of the extra code for this either, but I was given to > understand that it was the standard way to collect and cleanup VMAs. Christoph you tried tried to clean it once globally, what happened to that? All that is needed here is a way to get a unique inode for the PCI memory. Jason