On Tue, Jan 14, 2025 at 05:22:15PM +0100, David Hildenbrand wrote: > On 10.01.25 07:00, Alistair Popple wrote: > > Currently DAX folio/page reference counts are managed differently to > > normal pages. To allow these to be managed the same as normal pages > > introduce vmf_insert_folio_pud. This will map the entire PUD-sized folio > > and take references as it would for a normally mapped page. > > > > This is distinct from the current mechanism, vmf_insert_pfn_pud, which > > simply inserts a special devmap PUD entry into the page table without > > holding a reference to the page for the mapping. > > > > Signed-off-by: Alistair Popple <apopple@xxxxxxxxxx> > > [...] > > > +/** > > + * vmf_insert_folio_pud - insert a pud size folio mapped by a pud entry > > + * @vmf: Structure describing the fault > > + * @folio: folio to insert > > + * @write: whether it's a write fault > > + * > > + * Return: vm_fault_t value. > > + */ > > +vm_fault_t vmf_insert_folio_pud(struct vm_fault *vmf, struct folio *folio, bool write) > > +{ > > + struct vm_area_struct *vma = vmf->vma; > > + unsigned long addr = vmf->address & PUD_MASK; > > + pud_t *pud = vmf->pud; > > + struct mm_struct *mm = vma->vm_mm; > > + spinlock_t *ptl; > > + > > + if (addr < vma->vm_start || addr >= vma->vm_end) > > + return VM_FAULT_SIGBUS; > > + > > + if (WARN_ON_ONCE(folio_order(folio) != PUD_ORDER)) > > + return VM_FAULT_SIGBUS; > > + > > + ptl = pud_lock(mm, pud); > > + if (pud_none(*vmf->pud)) { > > + folio_get(folio); > > + folio_add_file_rmap_pud(folio, &folio->page, vma); > > + add_mm_counter(mm, mm_counter_file(folio), HPAGE_PUD_NR); > > + } > > + insert_pfn_pud(vma, addr, vmf->pud, pfn_to_pfn_t(folio_pfn(folio)), write); > > This looks scary at first (inserting something when not taking a reference), > but insert_pfn_pud() seems to handle that. A comment here would have been > nice. Indeed, I will add one. > It's weird, though, that if there is already something else, that we only > WARN but don't actually return an error. So ... Note we only WARN when there is already a mapping there and we're trying to upgrade it to writeable. This just mimics the logic which currently exists in insert_pfn() and insert_pfn_pmd(). The comment in insert_pfn() sheds more light: /* * For read faults on private mappings the PFN passed * in may not match the PFN we have mapped if the * mapped PFN is a writeable COW page. In the mkwrite * case we are creating a writable PTE for a shared * mapping and we expect the PFNs to match. If they * don't match, we are likely racing with block * allocation and mapping invalidation so just skip the * update. */ > > + spin_unlock(ptl); > > + > > + return VM_FAULT_NOPAGE; > > I assume always returning VM_FAULT_NOPAGE, even when something went wrong, > is the right thing to do? Yes, I think so. I guess in the WARN case we could return something like VM_FAULT_SIGBUS to kill the application, but the existing vmf_insert_*() functions don't currently do that so I think that would be a separate clean-up. > Apart from that LGTM. > > > -- > Cheers, > > David / dhildenb >