On 2022/6/16 1:51, Yang Shi wrote: > On Wed, Jun 15, 2022 at 8:14 AM Zach O'Keefe <zokeefe@xxxxxxxxxx> wrote: >> >> On 11 Jun 16:47, Miaohe Lin wrote: >>> When do_swap_page returns VM_FAULT_RETRY, we do not retry here and thus >>> swap entry will remain in pagetable. This will result in later failure. >>> So stop swapping in pages in this case to save cpu cycles. >>> >>> Signed-off-by: Miaohe Lin <linmiaohe@xxxxxxxxxx> >>> --- >>> mm/khugepaged.c | 19 ++++++++----------- >>> 1 file changed, 8 insertions(+), 11 deletions(-) >>> >>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c >>> index 73570dfffcec..a8adb2d1e9c6 100644 >>> --- a/mm/khugepaged.c >>> +++ b/mm/khugepaged.c >>> @@ -1003,19 +1003,16 @@ static bool __collapse_huge_page_swapin(struct mm_struct *mm, >>> swapped_in++; >>> ret = do_swap_page(&vmf); >>> >>> - /* do_swap_page returns VM_FAULT_RETRY with released mmap_lock */ >>> + /* >>> + * do_swap_page returns VM_FAULT_RETRY with released mmap_lock. >>> + * Note we treat VM_FAULT_RETRY as VM_FAULT_ERROR here because >>> + * we do not retry here and swap entry will remain in pagetable >>> + * resulting in later failure. >>> + */ >>> if (ret & VM_FAULT_RETRY) { >>> mmap_read_lock(mm); >>> - if (hugepage_vma_revalidate(mm, haddr, &vma)) { >>> - /* vma is no longer available, don't continue to swapin */ >>> - trace_mm_collapse_huge_page_swapin(mm, swapped_in, referenced, 0); >>> - return false; >>> - } >>> - /* check if the pmd is still valid */ >>> - if (mm_find_pmd(mm, haddr) != pmd) { >>> - trace_mm_collapse_huge_page_swapin(mm, swapped_in, referenced, 0); >>> - return false; >>> - } >>> + trace_mm_collapse_huge_page_swapin(mm, swapped_in, referenced, 0); >>> + return false; >>> } >>> if (ret & VM_FAULT_ERROR) { >>> trace_mm_collapse_huge_page_swapin(mm, swapped_in, referenced, 0); >>> -- >>> 2.23.0 >>> >>> >> >> I've convinced myself this is correct, but don't understand how we got here. >> AFAICT, we've always continued to fault in pages, and, as you mention, don't >> retry ones that have failed with VM_FAULT_RETRY - so >> __collapse_huge_page_isolate() should fail. I don't think (?) there is any >> benefit to continuing to swap if we don't handle VM_FAULT_RETRY appropriately. >> >> So, I think this change looks good from that perspective. I suppose the only >> other question would be: should we handle the VM_FAULT_RETRY case? Maybe 1 >> additional attempt then fail? AFAIK, this mostly (?) happens when the page is >> locked. Maybe it's not worth the extra complexity though.. > > It should be unnecessary for khugepaged IMHO since it will scan all > the valid mm periodically, so it will come back eventually. I tend to agree with Yang. Khugepaged will come back eventually so it's not worth the extra complexity. Thanks both! > >> > . >