On 08/12/2023 13:22, Matthew Wilcox wrote: > On Fri, Dec 08, 2023 at 08:34:01PM +1300, Barry Song wrote: >> arch_prepare_to_swap() should take folio rather than page as parameter >> because we support THP swap-out as a whole. It saves tags for all >> pages in a large folio. >> >> Meanwhile, arch_swap_restore() now moves to use page parameter rather >> than folio because swap-in, refault and MTE tags are working at the >> granularity of base pages rather than folio: >> 1. a large folio in swapcache can be partially unmapped, thus, MTE >> tags for the unmapped pages will be invalidated; >> 2. users might use mprotect() to set MTEs on a part of a large folio. > > I would argue that using mprotect() to set MTEs on part of a large folio > should cause that folio to be split. Could the user give us any > stronger signal that this memory is being used for different purposes, > and therefore should not be managed as a single entity? I agree this probably makes sense here. But splitting is best effort as I understand it? It can fail due to long-term GUP, right? In which case we still have to handle the MTE on partial large folio case safely, even if not performantly. As an aside, I don't think it's clear cut that we would always prefer to split based on user space mprotect/madvise/etc calls. IIUC, there are garbage collectors that temporarily mark pages RO then switch back to RW. I wouldn't want to split here and lose the benefits of contpte forever. I'm handwaving because I haven't looked into the exact mechanisms yet. But I think we need to understand these users better before deciding on an "always split based on user hints" policy. > >> Thus, it won't be easy to manage MTE tags at the granularity of folios >> since we do have some cases in which a part of pages in a large folios >> have valid tags, while the other part of pages haven't. Furthermore, >> trying to restore MTE tags for a whole folio can lead to many loops and >> early exiting even if the large folio in swapcache are still entirely >> mapped since do_swap_page() only sets PTE and frees swap for the base >> page where PF is happening. >> >> But we still have a chance to restore tags for a whole large folio >> once we support swap-in large folio. So this job is deferred till we >> can do refault and swap-in as a large folio. > > I strongly disagree with changing the interface to arch_swap_restore() > from folio to page.