On 2022/6/16 1:49, Yang Shi wrote: > On Sat, Jun 11, 2022 at 1:47 AM Miaohe Lin <linmiaohe@xxxxxxxxxx> wrote: >> >> When do_swap_page returns VM_FAULT_RETRY, we do not retry here and thus >> swap entry will remain in pagetable. This will result in later failure. >> So stop swapping in pages in this case to save cpu cycles. >> >> Signed-off-by: Miaohe Lin <linmiaohe@xxxxxxxxxx> >> --- >> mm/khugepaged.c | 19 ++++++++----------- >> 1 file changed, 8 insertions(+), 11 deletions(-) >> >> diff --git a/mm/khugepaged.c b/mm/khugepaged.c >> index 73570dfffcec..a8adb2d1e9c6 100644 >> --- a/mm/khugepaged.c >> +++ b/mm/khugepaged.c >> @@ -1003,19 +1003,16 @@ static bool __collapse_huge_page_swapin(struct mm_struct *mm, >> swapped_in++; >> ret = do_swap_page(&vmf); >> >> - /* do_swap_page returns VM_FAULT_RETRY with released mmap_lock */ >> + /* >> + * do_swap_page returns VM_FAULT_RETRY with released mmap_lock. >> + * Note we treat VM_FAULT_RETRY as VM_FAULT_ERROR here because >> + * we do not retry here and swap entry will remain in pagetable >> + * resulting in later failure. > > Yeah, it makes sense. > >> + */ >> if (ret & VM_FAULT_RETRY) { >> mmap_read_lock(mm); > > A further optimization, you should not need to relock mmap_lock. You > may consider returning a different value or passing in *locked and > setting it to false, then check this value in the caller to skip > unlock. Could we just keep the mmap_sem unlocked when __collapse_huge_page_swapin() fails due to the caller always doing mmap_read_unlock when __collapse_huge_page_swapin() returns false and add some comments about this behavior? This looks like a simple way for me. > >> - if (hugepage_vma_revalidate(mm, haddr, &vma)) { >> - /* vma is no longer available, don't continue to swapin */ >> - trace_mm_collapse_huge_page_swapin(mm, swapped_in, referenced, 0); >> - return false; >> - } >> - /* check if the pmd is still valid */ >> - if (mm_find_pmd(mm, haddr) != pmd) { >> - trace_mm_collapse_huge_page_swapin(mm, swapped_in, referenced, 0); >> - return false; >> - } >> + trace_mm_collapse_huge_page_swapin(mm, swapped_in, referenced, 0); >> + return false; >> } >> if (ret & VM_FAULT_ERROR) { >> trace_mm_collapse_huge_page_swapin(mm, swapped_in, referenced, 0); > > And I think "swapped_in++" needs to be moved after error handling. Do you mean do "swapped_in++" only after pages are swapped in successfully? Thanks! > >> -- >> 2.23.0 >> >> > . >