On Thu, Dec 7, 2023 at 11:04 PM David Hildenbrand <david@xxxxxxxxxx> wrote: > > >> > >>> not per-folio? I'm also not sure what it buys us - instead of reading a per-page > >>> flag we now have to read 128 bytes of tag for each page and check its zero. > >> > >> My point is, if that is the corner case, we might not care about that. > > > > Hi David, > > Hi! > > > my understanding is that this is NOT a corner. Alternatively, it is > > really a common case. > > If it happens with < 1% of all large folios on swapout/swapin, it's not > the common case. Even if some scenarios you point out below can and will > happen. > Fair enough. If we define "corner case" based on the percentage of those folios which can get partial MTE tags set or get partial MTE tags invalidated, I agree this is a corner case. I thought that a corner case was a case which could rarely happen. > > > > 1. a large folio can be partially unmapped when it is in swapche and > > after it is swapped out > > in all cases, its tags can be partially invalidated. I don't think > > this is a corner case, as long > > as userspaces are still working at the granularity of basepages, this > > is always going to > > happen. For example, userspace libc such as jemalloc can identify > > PAGESIZE, and use > > madvise(DONTNEED) to return memory to the kernel. Heap management is > > still working > > at the granularity of the basepage. > > > > 2. mprotect on a part of a large folio as Steven pointed out. > > > > 3.long term, we are working to swap-in large folios as a whole[1] just > > like swapping out large > > folios as a whole. for those ptes which are still contiguous swap > > entries, i mean, which > > are not unmapped by userspace after the large folios are swapped out > > to swap devices, > > we have a chance to swap in a whole large folio, we do have a chance > > to restore tags > > for the large folio without early-exit. but we still have a good > > chance to fall back to base > > page if we fail to allocate large folio, in this case, do_swap_page() > > still works at the > > granularity of basepage. and do_swap_page() will call swap_free(entry), tags of > > > > this particular page can be invalidated as a result. > > I don't immediately see how that relates. You get a fresh small folio > and simply load that tag from the internal datastructure. No messing > with large folios required, because you don't have a large folio. So no > considerations about large folio batch MTE tag restore apply. right. I was thinking the original large folio was partially swapped-in and forgot the new allocated page was actually one folio with only one page :-) Indeed, in that case, it is still restoring the MTE tag for the whole folio with one page. > > > > > 4. too many early-exit might be negative to performance. > > > > > > So I am thinking that in the future, we need two helpers, > > 1. void __arch_swap_restore(swp_entry_t entry, struct page *page); > > this is always needed to support page-level tag restore. > > > > 2. void arch_swap_restore(swp_entry_t entry, struct folio *folio); > > this can be a helper when we are able to swap in a whole folio. two > > conditions must be met > > (a). PTEs entries are still contiguous swap entries just as when large > > folios were swapped > > out. > > (b). we succeed in the allocation of a large folio in do_swap_page. > > > > For this moment, we only need 1; we will add 2 in swap-in large folio series. > > > > What do you think? > > I agree that it's better to keep it simple for now. > > -- > Cheers, > > David / dhildenb > Thanks Barry