On Thu, Jun 6, 2024 at 4:06 PM David Hildenbrand <david@xxxxxxxxxx> wrote: > > On 06.06.24 10:01, David Hildenbrand wrote: > > On 06.06.24 05:55, Lance Yang wrote: > >> On Wed, Jun 5, 2024 at 10:28 PM David Hildenbrand <david@xxxxxxxxxx> wrote: > >>> > >>> On 05.06.24 16:20, Lance Yang wrote: > >>>> Hi David, > >>>> > >>>> On Wed, Jun 5, 2024 at 8:46 PM David Hildenbrand <david@xxxxxxxxxx> wrote: > >>>>> > >>>>> On 21.05.24 06:02, Lance Yang wrote: > >>>>>> In preparation for supporting try_to_unmap_one() to unmap PMD-mapped > >>>>>> folios, start the pagewalk first, then call split_huge_pmd_address() to > >>>>>> split the folio. > >>>>>> > >>>>>> Since TTU_SPLIT_HUGE_PMD will no longer perform immediately, we might > >>>>>> encounter a PMD-mapped THP missing the mlock in the VM_LOCKED range during > >>>>>> the page walk. It’s probably necessary to mlock this THP to prevent it from > >>>>>> being picked up during page reclaim. > >>>>>> > >>>>>> Suggested-by: David Hildenbrand <david@xxxxxxxxxx> > >>>>>> Suggested-by: Baolin Wang <baolin.wang@xxxxxxxxxxxxxxxxx> > >>>>>> Signed-off-by: Lance Yang <ioworker0@xxxxxxxxx> > >>>>>> --- > >>>>> > >>>>> [...] again, sorry for the late review. > >>>> > >>>> No worries at all, thanks for taking time to review! > >>>> > >>>>> > >>>>>> diff --git a/mm/rmap.c b/mm/rmap.c > >>>>>> index ddffa30c79fb..08a93347f283 100644 > >>>>>> --- a/mm/rmap.c > >>>>>> +++ b/mm/rmap.c > >>>>>> @@ -1640,9 +1640,6 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, > >>>>>> if (flags & TTU_SYNC) > >>>>>> pvmw.flags = PVMW_SYNC; > >>>>>> > >>>>>> - if (flags & TTU_SPLIT_HUGE_PMD) > >>>>>> - split_huge_pmd_address(vma, address, false, folio); > >>>>>> - > >>>>>> /* > >>>>>> * For THP, we have to assume the worse case ie pmd for invalidation. > >>>>>> * For hugetlb, it could be much worse if we need to do pud > >>>>>> @@ -1668,20 +1665,35 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, > >>>>>> mmu_notifier_invalidate_range_start(&range); > >>>>>> > >>>>>> while (page_vma_mapped_walk(&pvmw)) { > >>>>>> - /* Unexpected PMD-mapped THP? */ > >>>>>> - VM_BUG_ON_FOLIO(!pvmw.pte, folio); > >>>>>> - > >>>>>> /* > >>>>>> * If the folio is in an mlock()d vma, we must not swap it out. > >>>>>> */ > >>>>>> if (!(flags & TTU_IGNORE_MLOCK) && > >>>>>> (vma->vm_flags & VM_LOCKED)) { > >>>>>> /* Restore the mlock which got missed */ > >>>>>> - if (!folio_test_large(folio)) > >>>>>> + if (!folio_test_large(folio) || > >>>>>> + (!pvmw.pte && (flags & TTU_SPLIT_HUGE_PMD))) > >>>>>> mlock_vma_folio(folio, vma); Should we still keep the '!pvmw.pte' here? Something like: if (!folio_test_large(folio) || !pvmw.pte) mlock_vma_folio(folio, vma); We can mlock the THP to prevent it from being picked up during page reclaim. David, I’d like to hear your thoughts on this ;) Thanks, Lance > >>>>> > >>>>> Can you elaborate why you think this would be required? If we would have > >>>>> performed the split_huge_pmd_address() beforehand, we would still be > >>>>> left with a large folio, no? > >>>> > >>>> Yep, there would still be a large folio, but it wouldn't be PMD-mapped. > >>>> > >>>> After Weifeng's series[1], the kernel supports mlock for PTE-mapped large > >>>> folio, but there are a few scenarios where we don't mlock a large folio, such > >>>> as when it crosses a VM_LOCKed VMA boundary. > >>>> > >>>> - if (!folio_test_large(folio)) > >>>> + if (!folio_test_large(folio) || > >>>> + (!pvmw.pte && (flags & TTU_SPLIT_HUGE_PMD))) > >>>> > >>>> And this check is just future-proofing and likely unnecessary. If encountering a > >>>> PMD-mapped THP missing the mlock for some reason, we can mlock this > >>>> THP to prevent it from being picked up during page reclaim, since it is fully > >>>> mapped and doesn't cross the VMA boundary, IIUC. > >>>> > >>>> What do you think? > >>>> I would appreciate any suggestions regarding this check ;) > >>> > >>> Reading this patch only, I wonder if this change makes sense in the > >>> context here. > >> > >> Allow me to try explaining it again ;) > >> > >>> > >>> Before this patch, we would have PTE-mapped the PMD-mapped THP before > >>> reaching this call and skipped it due to "!folio_test_large(folio)". > >> > >> Yes, there is only a PTE-mapped THP when doing the "!folio_test_large(folio)" > >> check, as we will first conditionally split the PMD via > >> split_huge_pmd_address(). > >> > >>> > >>> After this patch, we either > >> > >> Things will change. We'll first do the "!folio_test_large(folio)" check, then > >> conditionally split the PMD via split_huge_pmd_address(). > >> > >>> > >>> a) PTE-remap the THP after this check, but retry and end-up here again, > >>> whereby we would skip it due to "!folio_test_large(folio)". > >> > >> Hmm... > >> > >> IIUC, we will skip it after this check, stop the page walk, and not > >> PTE-remap the THP. > >> > >>> > >>> b) Discard the PMD-mapped THP due to lazyfree directly. Can that > >>> co-exist with mlock and what would be the problem here with mlock? > >> > >> Before discarding a PMD-mapped THP as a whole, as patch #3 did, > >> we also perform the "!folio_test_large(folio)" check. If the THP coexists > >> with mlock, we will skip it, stop the page walk, and not discard it. IIUC. > > > > But "!folio_test_large(folio)" would *skip* the THP and not consider it > > regarding mlock. > > > > I'm probably missing something > > I'm stupid, I missed that we still do the "goto walk_done_err;", only > that we don't do the mlock_vma_folio(folio, vma); > > Yes, let's drop it for now! :) > > -- > Cheers, > > David / dhildenb >