On Wed, Mar 27, 2024 at 4:27 PM Ryan Roberts <ryan.roberts@xxxxxxx> wrote: > > [...] > > >>> Test 1, sequential swapin/out of 30G zero page on ZRAM: > >>> > >>> Before (us) After (us) > >>> Swapout: 33619409 33886008 > >>> Swapin: 32393771 32465441 (- 0.2%) > >>> Swapout (THP): 7817909 6899938 (+11.8%) > >>> Swapin (THP) : 32452387 33193479 (- 2.2%) > >> > >> If my understanding were correct, we don't have swapin (THP) support, > >> yet. Right? > > > > Yes, this series doesn't change how swapin/swapout works with THP in > > general, but now THP swapout will leave shadows with large order, so > > it needs to be splitted upon swapin, that will slow down later swapin > > by a little bit but I think that's worth it. > > > > If we can do THP swapin in the future, this split on swapin can be > > saved to make the performance even better. > > I'm confused by this (clearly my understanding of how this works is incorrect). > Perhaps you can help me understand: > > When you talk about "shadows" I assume you are referring to the swap cache? It > was my understanding that swapping out a THP would always leave the large folio > in the swap cache, so this is nothing new? > > And on swap-in, if the target page is in the swap cache, even if part of a large > folio, why does it need to be split? I assumed the single page would just be > mapped? (and if all the other pages subsequently fault, then you end up with a > fully mapped large folio back in the process)? > > Perhaps I'm misunderstanding what "shadows" are? Hi Ryan My bad I haven't made this clear. Ying have posted the link to the commit that added "shadow" support for anon pages, it has become a very important part for LRU activation / workingset tracking. Basically when folios are removed from the cache xarray (eg. after swap writeback is done), instead of releasing the xarray slot, an unsigned long / void * is stored to it, recording some info that will be used when refault happens, to decide how to handle the folio from LRU / workingset side. And about large folio in swapcahce: if you look at the current version of add_to_swap_cache in mainline (it adds a folio of any order into swap cache), it calls xas_create_range(&xas) which fill all xarray slots in entire range covered by the folio. But xarray supports multi-index storing, making use of the nature of the radix tree to save a lot of slots. eg. for a 2M THP page, previously 8 + 512 slots (8 extra xa nodes) is needed to store it, after this series it only needs 8 slots by using a multi-index store. (not sure if I did the math right). Same for shadow, when folio is being deleted, __delete_from_swap_cache will currently walk the xarray with xas_next update all 8 + 512 slots one by one, after this series only 8 stores are needed (ignoring fragmentation). And upon swapin, I was talking about swapin 1 sub page of a THP folio, and the folio is gone, leaving a few multi-index shadow slots. The multi-index slots need to be splitted (multi-index slot have to be updated as a whole or split first, __filemap_add_folio handles such split), I optimize and reused routine in __filemap_add_folio in this series so without too much work it works perfectly for swapcache.