Re: All MADV_FREE mTHPs are fully subjected to deferred_split_folio()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Dec 31, 2024 at 8:32 AM David Hildenbrand <david@xxxxxxxxxx> wrote:
>
> >> goto discard;
> >>
> >
> > I agree that this is necessary, but I'm not sure it addresses my
> > concerns. MADV_FREE'ed mTHPs are still being added to `deferred_split`,
> > and this does not resolve the issue of them being partially unmapped
> > though it is definitely better than the existing code, at least folios are
> > not moved back to swap-backed.
>  > > On the other hand, users might rely on the `deferred_split` counter to
> > assess how aggressively userspace is performing address/size unaligned
> > operations
> > like MADV_DONTNEED or unmapped behavior. However, our debugging shows
> > that the majority of `deferred_split` counter increments result from
> > aligned MADV_FREE operations. This diminishes the counter's usefulness
> > in reflecting unaligned userspace behavior.
>
> Optimizing that is certainly something to look into, but the bigger
> issue you describe arises from bad handling of speculative references.
>
> Just imagine you indeed have a partially-mapped anon folio and the
> remaining pages are MADV_FREE'ed. The problem with the speculative
> reference would still apply.
>
> >
> > If possible, I am still looking for some approach to entirely avoid
> > adding the folio to deferred_split and partially being unmapped.
> >
> > Could the concept be something like this?
>
> Very likely it's wrong, because you really have to assure that that
> folio range is mapped here.
>
> Proper folio PTE batching should be applied here -- folio_pte_batch() etc.
>

I agree that using `folio_pte_batch()` to check if all PTEs are mapped and
determining `any_dirty` for setting swap-backed is the right direction. I'm
just curious if `(!list_empty(&folio->_deferred_list))` or
`folio_test_partially_mapped(folio)` could replace it if we're aiming for
a smaller change :-)

> That can please the counters in many, but not all cases. Again, maybe
> the deferred-split handling should be handled differently, and not
> synchronously from rmap code.
>
> I see 3 different work items
>
> 1) Fix mis-handling of speculative references
>

Agreed, the patch you're sending is absolutely necessary. I'd prefer
it lands sooner in some way. Would you like to post it?

> 2) Perform proper PTE batching during unmap/migration. Will improve
>     performance in any case.

Agreed. I remember discussing this with Ryan in an email thread about
a year ago, even for normal (non-MADV_FREE'ed) folios, but it seems
everyone has been busy with other priorities.

This seems like a good time to start exploring the idea. We could begin
with MADV_FREE'ed folios and later extend it to normal folios—for
instance, by implementing batched setting of swap entries.

>
> 3) Try moving deferred-split handling out of rmap code into reclaim/
>     access-bit handling.

I'm not quite sure we still need this after having 1 and 2. With those,
we've been able to operate on the mTHP as a whole. Do we still
need to move deferred_split out of rmap?

>
> --
> Cheers,
>
> David / dhildenb
>

Thanks
Barry





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux