not per-folio? I'm also not sure what it buys us - instead of reading a per-page
flag we now have to read 128 bytes of tag for each page and check its zero.
My point is, if that is the corner case, we might not care about that.
Hi David,
Hi!
my understanding is that this is NOT a corner. Alternatively, it is
really a common case.
If it happens with < 1% of all large folios on swapout/swapin, it's not
the common case. Even if some scenarios you point out below can and will
happen.
1. a large folio can be partially unmapped when it is in swapche and
after it is swapped out
in all cases, its tags can be partially invalidated. I don't think
this is a corner case, as long
as userspaces are still working at the granularity of basepages, this
is always going to
happen. For example, userspace libc such as jemalloc can identify
PAGESIZE, and use
madvise(DONTNEED) to return memory to the kernel. Heap management is
still working
at the granularity of the basepage.
2. mprotect on a part of a large folio as Steven pointed out.
3.long term, we are working to swap-in large folios as a whole[1] just
like swapping out large
folios as a whole. for those ptes which are still contiguous swap
entries, i mean, which
are not unmapped by userspace after the large folios are swapped out
to swap devices,
we have a chance to swap in a whole large folio, we do have a chance
to restore tags
for the large folio without early-exit. but we still have a good
chance to fall back to base
page if we fail to allocate large folio, in this case, do_swap_page()
still works at the
granularity of basepage. and do_swap_page() will call swap_free(entry), tags of
this particular page can be invalidated as a result.
I don't immediately see how that relates. You get a fresh small folio
and simply load that tag from the internal datastructure. No messing
with large folios required, because you don't have a large folio. So no
considerations about large folio batch MTE tag restore apply.
4. too many early-exit might be negative to performance.
So I am thinking that in the future, we need two helpers,
1. void __arch_swap_restore(swp_entry_t entry, struct page *page);
this is always needed to support page-level tag restore.
2. void arch_swap_restore(swp_entry_t entry, struct folio *folio);
this can be a helper when we are able to swap in a whole folio. two
conditions must be met
(a). PTEs entries are still contiguous swap entries just as when large
folios were swapped
out.
(b). we succeed in the allocation of a large folio in do_swap_page.
For this moment, we only need 1; we will add 2 in swap-in large folio series.
What do you think?
I agree that it's better to keep it simple for now.
--
Cheers,
David / dhildenb