On Tue, Jan 14, 2025 at 05:15:54PM +0100, David Hildenbrand wrote: > On 10.01.25 07:00, Alistair Popple wrote: > > Currently to map a DAX page the DAX driver calls vmf_insert_pfn. This > > creates a special devmap PTE entry for the pfn but does not take a > > reference on the underlying struct page for the mapping. This is > > because DAX page refcounts are treated specially, as indicated by the > > presence of a devmap entry. > > > > To allow DAX page refcounts to be managed the same as normal page > > refcounts introduce vmf_insert_page_mkwrite(). This will take a > > reference on the underlying page much the same as vmf_insert_page, > > except it also permits upgrading an existing mapping to be writable if > > requested/possible. > > > > Signed-off-by: Alistair Popple <apopple@xxxxxxxxxx> > > > > --- > > > > Updates from v2: > > > > - Rename function to make not DAX specific > > > > - Split the insert_page_into_pte_locked() change into a separate > > patch. > > > > Updates from v1: > > > > - Re-arrange code in insert_page_into_pte_locked() based on comments > > from Jan Kara. > > > > - Call mkdrity/mkyoung for the mkwrite case, also suggested by Jan. > > --- > > include/linux/mm.h | 2 ++ > > mm/memory.c | 36 ++++++++++++++++++++++++++++++++++++ > > 2 files changed, 38 insertions(+) > > > > diff --git a/include/linux/mm.h b/include/linux/mm.h > > index e790298..f267b06 100644 > > --- a/include/linux/mm.h > > +++ b/include/linux/mm.h > > @@ -3620,6 +3620,8 @@ int vm_map_pages(struct vm_area_struct *vma, struct page **pages, > > unsigned long num); > > int vm_map_pages_zero(struct vm_area_struct *vma, struct page **pages, > > unsigned long num); > > +vm_fault_t vmf_insert_page_mkwrite(struct vm_fault *vmf, struct page *page, > > + bool write); > > vm_fault_t vmf_insert_pfn(struct vm_area_struct *vma, unsigned long addr, > > unsigned long pfn); > > vm_fault_t vmf_insert_pfn_prot(struct vm_area_struct *vma, unsigned long addr, > > diff --git a/mm/memory.c b/mm/memory.c > > index 8531acb..c60b819 100644 > > --- a/mm/memory.c > > +++ b/mm/memory.c > > @@ -2624,6 +2624,42 @@ static vm_fault_t __vm_insert_mixed(struct vm_area_struct *vma, > > return VM_FAULT_NOPAGE; > > } > > +vm_fault_t vmf_insert_page_mkwrite(struct vm_fault *vmf, struct page *page, > > + bool write) > > +{ > > + struct vm_area_struct *vma = vmf->vma; > > + pgprot_t pgprot = vma->vm_page_prot; > > + unsigned long pfn = page_to_pfn(page); > > + unsigned long addr = vmf->address; > > + int err; > > + > > + if (addr < vma->vm_start || addr >= vma->vm_end) > > + return VM_FAULT_SIGBUS; > > + > > + track_pfn_insert(vma, &pgprot, pfn_to_pfn_t(pfn)); > > I think I raised this before: why is this track_pfn_insert() in here? It > only ever does something to VM_PFNMAP mappings, and that cannot possibly be > the case here (nothing in VM_PFNMAP is refcounted, ever)? Yes, I also had deja vu reading this comment and a vague recollection of fixing them too. Your comments[1] were for vmf_insert_folio_pud() though which exlains why I neglected to do the same clean-up here even though I should have so thanks for pointing them out. [1] - https://lore.kernel.org/linux-mm/ee19854f-fa1f-4207-9176-3c7b79bccd07@xxxxxxxxxx/ > > > + > > + if (!pfn_modify_allowed(pfn, pgprot)) > > + return VM_FAULT_SIGBUS; > > Why is that required? Why are we messing so much with PFNs? :) > > Note that x86 does in there > > /* If it's real memory always allow */ > if (pfn_valid(pfn)) > return true; > > See below, when would we ever have a "struct page *" but !pfn_valid() ? > > > > + > > + /* > > + * We refcount the page normally so make sure pfn_valid is true. > > + */ > > + if (!pfn_valid(pfn)) > > + return VM_FAULT_SIGBUS; > > Somebody gave us a "struct page", how could the pfn ever by invalid (not > have a struct page)? > > I think all of the above regarding PFNs should be dropped -- unless I am > missing something important. > > > + > > + if (WARN_ON(is_zero_pfn(pfn) && write)) > > + return VM_FAULT_SIGBUS; > > is_zero_page() if you already have the "page". But note that in > validate_page_before_insert() we do have a check that allows for conditional > insertion of the shared zeropage. > > So maybe this hunk is also not required. Yes, also not required. I have removed the above hunks as well because we don't need any of this pfn stuff. Again it's just a hangover from an earlier version of the series when I was passing pfn's rather than pages here. > > + > > + err = insert_page(vma, addr, page, pgprot, write); > > + if (err == -ENOMEM) > > + return VM_FAULT_OOM; > > + if (err < 0 && err != -EBUSY) > > + return VM_FAULT_SIGBUS; > > + > > + return VM_FAULT_NOPAGE; > > +} > > +EXPORT_SYMBOL_GPL(vmf_insert_page_mkwrite); > > > > > > -- > Cheers, > > David / dhildenb >