On Wed, Jun 15, 2022 at 8:14 AM Zach O'Keefe <zokeefe@xxxxxxxxxx> wrote: > > On 11 Jun 16:47, Miaohe Lin wrote: > > When do_swap_page returns VM_FAULT_RETRY, we do not retry here and thus > > swap entry will remain in pagetable. This will result in later failure. > > So stop swapping in pages in this case to save cpu cycles. > > > > Signed-off-by: Miaohe Lin <linmiaohe@xxxxxxxxxx> > > --- > > mm/khugepaged.c | 19 ++++++++----------- > > 1 file changed, 8 insertions(+), 11 deletions(-) > > > > diff --git a/mm/khugepaged.c b/mm/khugepaged.c > > index 73570dfffcec..a8adb2d1e9c6 100644 > > --- a/mm/khugepaged.c > > +++ b/mm/khugepaged.c > > @@ -1003,19 +1003,16 @@ static bool __collapse_huge_page_swapin(struct mm_struct *mm, > > swapped_in++; > > ret = do_swap_page(&vmf); > > > > - /* do_swap_page returns VM_FAULT_RETRY with released mmap_lock */ > > + /* > > + * do_swap_page returns VM_FAULT_RETRY with released mmap_lock. > > + * Note we treat VM_FAULT_RETRY as VM_FAULT_ERROR here because > > + * we do not retry here and swap entry will remain in pagetable > > + * resulting in later failure. > > + */ > > if (ret & VM_FAULT_RETRY) { > > mmap_read_lock(mm); > > - if (hugepage_vma_revalidate(mm, haddr, &vma)) { > > - /* vma is no longer available, don't continue to swapin */ > > - trace_mm_collapse_huge_page_swapin(mm, swapped_in, referenced, 0); > > - return false; > > - } > > - /* check if the pmd is still valid */ > > - if (mm_find_pmd(mm, haddr) != pmd) { > > - trace_mm_collapse_huge_page_swapin(mm, swapped_in, referenced, 0); > > - return false; > > - } > > + trace_mm_collapse_huge_page_swapin(mm, swapped_in, referenced, 0); > > + return false; > > } > > if (ret & VM_FAULT_ERROR) { > > trace_mm_collapse_huge_page_swapin(mm, swapped_in, referenced, 0); > > -- > > 2.23.0 > > > > > > I've convinced myself this is correct, but don't understand how we got here. > AFAICT, we've always continued to fault in pages, and, as you mention, don't > retry ones that have failed with VM_FAULT_RETRY - so > __collapse_huge_page_isolate() should fail. I don't think (?) there is any > benefit to continuing to swap if we don't handle VM_FAULT_RETRY appropriately. > > So, I think this change looks good from that perspective. I suppose the only > other question would be: should we handle the VM_FAULT_RETRY case? Maybe 1 > additional attempt then fail? AFAIK, this mostly (?) happens when the page is > locked. Maybe it's not worth the extra complexity though.. It should be unnecessary for khugepaged IMHO since it will scan all the valid mm periodically, so it will come back eventually. >