On 2021-09-29 5:05 p.m., Jason Gunthorpe wrote: > On Wed, Sep 29, 2021 at 03:42:00PM -0600, Logan Gunthorpe wrote: > >> The main reason is probably this: if we don't use VM_MIXEDMAP, then we >> can't set pte_devmap(). > > I think that is an API limitation in the fault routines.. > > finish_fault() should set the pte_devmap - eg by passing the > PFN_DEV|PFN_MAP somehow through the vma->vm_page_prot to mk_pte() or > otherwise signaling do_set_pte() that it should set those PTE bits > when it creates the entry. > > (or there should be a vmf_* helper for this special case, but using > the vmf->page seems righter to me) I'm not opposed to this. Though I'm not sure what's best here. >> If we don't set pte_devmap(), then every single page that GUP >> processes needs to check if it's a ZONE_DEVICE page and also if it's >> a P2PDMA page (thus dereferencing pgmap) in order to satisfy the >> requirements of FOLL_PCI_P2PDMA. > > Definately not suggesting not to set pte_devmap(), only that > VM_MIXEDMAP should not be set on VMAs that only contain struct > pages. That is an abuse of what it is intended for. > > At the very least there should be a big comment above the usage > explaining that this is just working around a limitation in > finish_fault() where it cannot set the PFN_DEV|PFN_MAP bits today. Is it? Documentation on vmf_insert_mixed() and VM_MIXEDMAP is not good and the intention is not clear. I got the impression that mm people wanted those interfaces used for users of pte_devmap(). device-dax uses these interfaces and as far as I can see it also only contains struct pages (or at least dev_dax_huge_fault() calls pfn_to_page() on every page when VM_FAULT_NOPAGE happens). So it would be nice to get some direction here from mm developers on what they'd prefer. Logan