On Wed, Aug 14, 2024 at 02:24:47PM -0400, Peter Xu wrote: > On Wed, Aug 14, 2024 at 10:19:54AM -0300, Jason Gunthorpe wrote: > > On Fri, Aug 09, 2024 at 12:08:59PM -0400, Peter Xu wrote: > > > > > +/** > > > + * follow_pfnmap_start() - Look up a pfn mapping at a user virtual address > > > + * @args: Pointer to struct @follow_pfnmap_args > > > + * > > > + * The caller needs to setup args->vma and args->address to point to the > > > + * virtual address as the target of such lookup. On a successful return, > > > + * the results will be put into other output fields. > > > + * > > > + * After the caller finished using the fields, the caller must invoke > > > + * another follow_pfnmap_end() to proper releases the locks and resources > > > + * of such look up request. > > > + * > > > + * During the start() and end() calls, the results in @args will be valid > > > + * as proper locks will be held. After the end() is called, all the fields > > > + * in @follow_pfnmap_args will be invalid to be further accessed. > > > + * > > > + * If the PTE maps a refcounted page, callers are responsible to protect > > > + * against invalidation with MMU notifiers; otherwise access to the PFN at > > > + * a later point in time can trigger use-after-free. > > > + * > > > + * Only IO mappings and raw PFN mappings are allowed. > > > > What does this mean? The paragraph before said this can return a > > refcounted page? > > This came from the old follow_pte(), I kept that as I suppose we should > allow VM_IO | VM_PFNMAP just like before, even if in this case I suppose > only the pfnmap matters where huge mappings can start to appear. If that is the intention it should actively block returning anything that is vm_normal_page() not check the VM flags, see the other discussion.. It makes sense as a restriction if you call the API follow pfnmap. > > > + * The mmap semaphore > > > + * should be taken for read, and the mmap semaphore cannot be released > > > + * before the end() is invoked. > > > > This function is not safe for IO mappings and PFNs either, VFIO has a > > known security issue to call it. That should be emphasised in the > > comment. > > Any elaboration on this? I could have missed that.. Just because the memory is a PFN or IO doesn't mean it is safe to access it without a refcount. There are many driver scenarios where revoking a PFN from mmap needs to be a hard fence that nothing else has access to that PFN. Otherwise it is a security problem for that driver. > I suppose so? As the pgtable is stable, I thought it means it's safe, but > I'm not sure now when you mentioned there's a VFIO known issue, so I could > have overlooked something. There's no address returned, but pfn, pgprot, > write, etc. zap/etc will wait on the PTL, I think, so it should be safe for at least the issues I am thinking of. > The user needs to do proper mapping if they need an usable address, > e.g. generic_access_phys() does ioremap_prot() and recheck the pfn didn't > change. No, you can't take the phys_addr_t outside the start/end region that explicitly holds the lock protecting it. This is what the comment must warn against doing. Jason