On Fri, Aug 16, 2024 at 04:12:24PM -0700, Sean Christopherson wrote: > On Fri, Aug 09, 2024, Peter Xu wrote: > > Introduce a pair of APIs to follow pfn mappings to get entry information. > > It's very similar to what follow_pte() does before, but different in that > > it recognizes huge pfn mappings. > > ... > > > +int follow_pfnmap_start(struct follow_pfnmap_args *args); > > +void follow_pfnmap_end(struct follow_pfnmap_args *args); > > I find the start+end() terminology to be unintuitive. E.g. I had to look at the > implementation to understand why KVM invoke fixup_user_fault() if follow_pfnmap_start() > failed. > > What about follow_pfnmap_and_lock()? And then maybe follow_pfnmap_unlock()? > Though that second one reads a little weird. If to go with the _lock() I tend to drop "and" to follow_pfnmap_[un]lock(). However looks like David preferred me keeping the name, so we don't reach a quorum yet. I'm happy to change the name as long as we have enough votes.. > > > + * Return: zero on success, -ve otherwise. > > ve? This one came from the old follow_pte() and I kept it. I only knew this after search: a short way to write "negative" (while positive is "+ve"). Doesn't look like something productive.. I'll spell it out in the next version. > > > +int follow_pfnmap_start(struct follow_pfnmap_args *args) > > +{ > > + struct vm_area_struct *vma = args->vma; > > + unsigned long address = args->address; > > + struct mm_struct *mm = vma->vm_mm; > > + spinlock_t *lock; > > + pgd_t *pgdp; > > + p4d_t *p4dp, p4d; > > + pud_t *pudp, pud; > > + pmd_t *pmdp, pmd; > > + pte_t *ptep, pte; > > + > > + pfnmap_lockdep_assert(vma); > > + > > + if (unlikely(address < vma->vm_start || address >= vma->vm_end)) > > + goto out; > > + > > + if (!(vma->vm_flags & (VM_IO | VM_PFNMAP))) > > + goto out; > > Why use goto intead of simply? > > return -EINVAL; > > That's relevant because I think the cases where no PxE is found should return > -ENOENT, not -EINVAL. E.g. if the caller doesn't precheck, then it can bail > immediately on EINVAL, but know that it's worth trying to fault-in the pfn on > ENOENT. I tend to avoid changing the retval in this series to make the goal of this patchset simple. One issue is I _think_ there's one ioctl() that will rely on this retval: acrn_dev_ioctl -> acrn_vm_memseg_map -> acrn_vm_ram_map -> follow_pfnmap_start So we may want to try check with people to not break it.. Thanks, -- Peter Xu