Re: All MADV_FREE mTHPs are fully subjected to deferred_split_folio()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Barry,

On Mon, Dec 30, 2024 at 5:13 AM Barry Song <21cnbao@xxxxxxxxx> wrote:
>
> Hi Lance,
>
> Along with Ryan, David, Baolin, and anyone else who might be interested,
>
> We’ve noticed an unexpectedly high number of deferred splits. The root
> cause appears to be the changes introduced in commit dce7d10be4bbd3
> ("mm/madvise: optimize lazyfreeing with mTHP in madvise_free"). Since
> that commit, split_folio is no longer called in mm/madvise.c.
>
> However, we are still performing deferred_split_folio for all
> MADV_FREE mTHPs, even for those that are fully aligned with mTHP.
> This happens because we execute a goto discard in
> try_to_unmap_one(), which eventually leads to
> folio_remove_rmap_pte() adding all folios to deferred_split when we
> scan the 1st pte in try_to_unmap_one().
>
> discard:
>                 if (unlikely(folio_test_hugetlb(folio)))
>                         hugetlb_remove_rmap(folio);
>                 else
>                         folio_remove_rmap_pte(folio, subpage, vma);
>
> This could lead to a race condition with shrinker - deferred_split_scan().
> The shrinker might call folio_try_get(folio), and while we are scanning
> the second PTE of this folio in try_to_unmap_one(), the entire mTHP
> could be transitioned back to swap-backed because the reference count
> is incremented.
>
>                                 /*
>                                  * The only page refs must be one from isolation
>                                  * plus the rmap(s) (dropped by discard:).
>                                  */
>                                 if (ref_count == 1 + map_count &&
>                                     (!folio_test_dirty(folio) ||
>                                      ...
>                                      (vma->vm_flags & VM_DROPPABLE))) {
>                                         dec_mm_counter(mm, MM_ANONPAGES);
>                                         goto discard;
>                                 }
>
> It also significantly increases contention on ds_queue->split_queue_lock during
> memory reclamation and could potentially introduce other race conditions with
> shrinker as well.

Good catch!

>
> I’m curious if anyone has suggestions for resolving this issue. My
> idea is to use
> folio_remove_rmap_ptes to drop all PTEs at once, rather than
> folio_remove_rmap_pte,
> which processes PTEs one by one for an mTHP. This approach would require some
> changes, such as checking the dirty state of PTEs and performing a TLB
> flush for the
> entire mTHP as a whole in try_to_unmap_one().

Yeah, IHMO, it would also be beneficial to reclaim entire mTHPs as a whole
in real-world scenarios where MADV_FREE mTHPs are typically no longer
written ;)

>
> Please let me know if you have any objections or alternative suggestions.

Let's hear suggestions from other folks as well ~

Thanks,
Lance

>
> Thanks
> Barry





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux