On Thu 27-06-24 10:54:21, Alistair Popple wrote: > Currently to map a DAX page the DAX driver calls vmf_insert_pfn. This > creates a special devmap PTE entry for the pfn but does not take a > reference on the underlying struct page for the mapping. This is > because DAX page refcounts are treated specially, as indicated by the > presence of a devmap entry. > > To allow DAX page refcounts to be managed the same as normal page > refcounts introduce dax_insert_pfn. This will take a reference on the > underlying page much the same as vmf_insert_page, except it also > permits upgrading an existing mapping to be writable if > requested/possible. > > Signed-off-by: Alistair Popple <apopple@xxxxxxxxxx> Overall this looks good to me. Some comments below. > --- > include/linux/mm.h | 4 ++- > mm/memory.c | 79 ++++++++++++++++++++++++++++++++++++++++++----- > 2 files changed, 76 insertions(+), 7 deletions(-) > > diff --git a/include/linux/mm.h b/include/linux/mm.h > index 9a5652c..b84368b 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -1080,6 +1080,8 @@ int vma_is_stack_for_current(struct vm_area_struct *vma); > struct mmu_gather; > struct inode; > > +extern void prep_compound_page(struct page *page, unsigned int order); > + You don't seem to use this function in this patch? > diff --git a/mm/memory.c b/mm/memory.c > index ce48a05..4f26a1f 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -1989,14 +1989,42 @@ static int validate_page_before_insert(struct page *page) > } > > static int insert_page_into_pte_locked(struct vm_area_struct *vma, pte_t *pte, > - unsigned long addr, struct page *page, pgprot_t prot) > + unsigned long addr, struct page *page, pgprot_t prot, bool mkwrite) > { > struct folio *folio = page_folio(page); > + pte_t entry = ptep_get(pte); > > - if (!pte_none(ptep_get(pte))) > + if (!pte_none(entry)) { > + if (mkwrite) { > + /* > + * For read faults on private mappings the PFN passed > + * in may not match the PFN we have mapped if the > + * mapped PFN is a writeable COW page. In the mkwrite > + * case we are creating a writable PTE for a shared > + * mapping and we expect the PFNs to match. If they > + * don't match, we are likely racing with block > + * allocation and mapping invalidation so just skip the > + * update. > + */ > + if (pte_pfn(entry) != page_to_pfn(page)) { > + WARN_ON_ONCE(!is_zero_pfn(pte_pfn(entry))); > + return -EFAULT; > + } > + entry = maybe_mkwrite(entry, vma); > + entry = pte_mkyoung(entry); > + if (ptep_set_access_flags(vma, addr, pte, entry, 1)) > + update_mmu_cache(vma, addr, pte); > + return 0; > + } > return -EBUSY; If you do this like: if (!mkwrite) return -EBUSY; You can reduce indentation of the big block and also making the flow more obvious... > + } > + > /* Ok, finally just insert the thing.. */ > folio_get(folio); > + if (mkwrite) > + entry = maybe_mkwrite(mk_pte(page, prot), vma); > + else > + entry = mk_pte(page, prot); I'd prefer: entry = mk_pte(page, prot); if (mkwrite) entry = maybe_mkwrite(entry, vma); but I don't insist. Also insert_pfn() additionally has pte_mkyoung() and pte_mkdirty(). Why was it left out here? Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR