Re: [RFC V3 PATCH] arm64: mm: swap: save and restore mte tags for large folios

David Hildenbrand <david@xxxxxxxxxx> · Thu, 7 Dec 2023 11:03:58 +0100

not per-folio? I'm also not sure what it buys us - instead of reading a per-page
flag we now have to read 128 bytes of tag for each page and check its zero.

My point is, if that is the corner case, we might not care about that.

Hi David,

Hi!

my understanding is that this is NOT a corner. Alternatively, it is
really a common case.

If it happens with < 1% of all large folios on swapout/swapin, it's not 
the common case. Even if some scenarios you point out below can and will 
happen.

1. a large folio can be partially unmapped when it is in swapche and
after it is swapped out
in all cases, its tags can be partially invalidated. I don't think
this is a corner case, as long
as userspaces are still working at the granularity of basepages, this
is always going to
happen. For example, userspace libc such as jemalloc can identify
PAGESIZE, and use
madvise(DONTNEED) to return memory to the kernel. Heap management is
still working
at the granularity of the basepage.

2. mprotect on a part of a large folio as Steven pointed out.

3.long term, we are working to swap-in large folios as a whole[1] just
like swapping out large
folios as a whole. for those ptes which are still contiguous swap
entries, i mean, which
are not unmapped by userspace after the large folios are swapped out
to swap devices,
we have a chance to swap in a whole large folio, we do have a chance
to restore tags
for the large folio without early-exit. but we still have a good
chance to fall back to base
page if we fail to allocate large folio, in this case, do_swap_page()
still works at the
granularity of basepage. and do_swap_page() will call swap_free(entry),  tags of

this particular page can be invalidated as a result.

I don't immediately see how that relates. You get a fresh small folio 
and simply load that tag from the internal datastructure. No messing 
with large folios required, because you don't have a large folio. So no 
considerations about large folio batch MTE tag restore apply.

4. too many early-exit might be negative to performance.

So I am thinking that in the future, we need two helpers,
1. void __arch_swap_restore(swp_entry_t entry, struct page *page);
this is always needed to support page-level tag restore.

2.  void arch_swap_restore(swp_entry_t entry, struct folio *folio);
this can be a helper when we are able to swap in a whole folio. two
conditions must be met
(a). PTEs entries are still contiguous swap entries just as when large
folios were swapped
out.
(b). we succeed in the allocation of a large folio in do_swap_page.

For this moment, we only need 1; we will add 2 in swap-in large folio series.

What do you think?

I agree that it's better to keep it simple for now.

--
Cheers,

David / dhildenb