On Mon, 17 Jul 2023, Ryan Roberts wrote: > >>>> +static int alloc_anon_folio(struct vm_fault *vmf, struct folio **folio) > >>>> +{ > >>>> + int i; > >>>> + gfp_t gfp; > >>>> + pte_t *pte; > >>>> + unsigned long addr; > >>>> + struct vm_area_struct *vma = vmf->vma; > >>>> + int prefer = anon_folio_order(vma); > >>>> + int orders[] = { > >>>> + prefer, > >>>> + prefer > PAGE_ALLOC_COSTLY_ORDER ? PAGE_ALLOC_COSTLY_ORDER : 0, > >>>> + 0, > >>>> + }; > >>>> + > >>>> + *folio = NULL; > >>>> + > >>>> + if (vmf_orig_pte_uffd_wp(vmf)) > >>>> + goto fallback; > >>>> + > >>>> + for (i = 0; orders[i]; i++) { > >>>> + addr = ALIGN_DOWN(vmf->address, PAGE_SIZE << orders[i]); > >>>> + if (addr >= vma->vm_start && > >>>> + addr + (PAGE_SIZE << orders[i]) <= vma->vm_end) > >>>> + break; > >>>> + } > >>>> + > >>>> + if (!orders[i]) > >>>> + goto fallback; > >>>> + > >>>> + pte = pte_offset_map(vmf->pmd, vmf->address & PMD_MASK); > >>>> + if (!pte) > >>>> + return -EAGAIN; > >>> > >>> It would be a bug if this happens. So probably -EINVAL? > >> > >> Not sure what you mean? Hugh Dickins' series that went into v6.5-rc1 makes it > >> possible for pte_offset_map() to fail (if I understood correctly) and we have to > >> handle this. The intent is that we will return from the fault without making any > >> change, then we will refault and try again. > > > > Thanks for checking that -- it's very relevant. One detail is that > > that series doesn't affect anon. IOW, collapsing PTEs into a PMD can't > > happen while we are holding mmap_lock for read here, and therefore, > > the race that could cause pte_offset_map() on shmem/file PTEs to fail > > doesn't apply here. > > But Hugh's patches have changed do_anonymous_page() to handle failure from > pte_offset_map_lock(). So I was just following that pattern. If this really > can't happen, then I'd rather WARN/BUG on it, and simplify alloc_anon_folio()'s > prototype to just return a `struct folio *` (and if it's null that means ENOMEM). > > Hugh, perhaps you can comment? I agree with your use of -EAGAIN there: I find it better to allow for the possibility, than to go to great effort persuading that it's impossible; especially because what's possible tomorrow may differ from today. And notice that, before my changes, there used to be a pmd_trans_unstable() check above, implying that it is possible for it to fail (for more reasons than corruption causing pmd_bad()) - one scenario would be that the pte_alloc() above succeeded *because* someone else had managed to insert a huge pmd there already (maybe we have MMF_DISABLE_THP but they did not). But I see from later mail that Yu Zhao now agrees with your -EAGAIN too, so we are all on the same folio. Hugh p.s. while giving opinions, I'm one of those against using "THP" for large but not pmd-mappable folios; and was glad to see Matthew arguing the same way when considering THP_SWPOUT in another thread today.