On 18/10/2024 08:26, David Hildenbrand wrote: > On 18.10.24 08:48, Kanchana P Sridhar wrote: >> This patch invokes the swapin_readahead() based batching interface to >> prefetch a batch of 4K folios for zswap load with batch decompressions >> in parallel using IAA hardware. swapin_readahead() prefetches folios based >> on vm.page-cluster and the usefulness of prior prefetches to the >> workload. As folios are created in the swapcache and the readahead code >> calls swap_read_folio() with a "zswap_batch" and a "non_zswap_batch", the >> respective folio_batches get populated with the folios to be read. >> >> Finally, the swapin_readahead() procedures will call the newly added >> process_ra_batch_of_same_type() which: >> >> 1) Reads all the non_zswap_batch folios sequentially by calling >> swap_read_folio(). >> 2) Calls swap_read_zswap_batch_unplug() with the zswap_batch which calls >> zswap_finish_load_batch() that finally decompresses each >> SWAP_CRYPTO_SUB_BATCH_SIZE sub-batch (i.e. upto 8 pages in a prefetch >> batch of say, 32 folios) in parallel with IAA. >> >> Within do_swap_page(), we try to benefit from batch decompressions in both >> these scenarios: >> >> 1) single-mapped, SWP_SYNCHRONOUS_IO: >> We call swapin_readahead() with "single_mapped_path = true". This is >> done only in the !zswap_never_enabled() case. >> 2) Shared and/or non-SWP_SYNCHRONOUS_IO folios: >> We call swapin_readahead() with "single_mapped_path = false". >> >> This will place folios in the swapcache: a design choice that handles cases >> where a folio that is "single-mapped" in process 1 could be prefetched in >> process 2; and handles highly contended server scenarios with stability. >> There are checks added at the end of do_swap_page(), after the folio has >> been successfully loaded, to detect if the single-mapped swapcache folio is >> still single-mapped, and if so, folio_free_swap() is called on the folio. >> >> Within the swapin_readahead() functions, if single_mapped_path is true, and >> either the platform does not have IAA, or, if the platform has IAA and the >> user selects a software compressor for zswap (details of sysfs knob >> follow), readahead/batching are skipped and the folio is loaded using >> zswap_load(). >> >> A new swap parameter "singlemapped_ra_enabled" (false by default) is added >> for platforms that have IAA, zswap_load_batching_enabled() is true, and we >> want to give the user the option to run experiments with IAA and with >> software compressors for zswap (swap device is SWP_SYNCHRONOUS_IO): >> >> For IAA: >> echo true > /sys/kernel/mm/swap/singlemapped_ra_enabled >> >> For software compressors: >> echo false > /sys/kernel/mm/swap/singlemapped_ra_enabled >> >> If "singlemapped_ra_enabled" is set to false, swapin_readahead() will skip >> prefetching folios in the "single-mapped SWP_SYNCHRONOUS_IO" do_swap_page() >> path. >> >> Thanks Ying Huang for the really helpful brainstorming discussions on the >> swap_read_folio() plug design. >> >> Suggested-by: Ying Huang <ying.huang@xxxxxxxxx> >> Signed-off-by: Kanchana P Sridhar <kanchana.p.sridhar@xxxxxxxxx> >> --- >> mm/memory.c | 187 +++++++++++++++++++++++++++++++++++++----------- >> mm/shmem.c | 2 +- >> mm/swap.h | 12 ++-- >> mm/swap_state.c | 157 ++++++++++++++++++++++++++++++++++++---- >> mm/swapfile.c | 2 +- >> 5 files changed, 299 insertions(+), 61 deletions(-) >> >> diff --git a/mm/memory.c b/mm/memory.c >> index b5745b9ffdf7..9655b85fc243 100644 >> --- a/mm/memory.c >> +++ b/mm/memory.c >> @@ -3924,6 +3924,42 @@ static vm_fault_t remove_device_exclusive_entry(struct vm_fault *vmf) >> return 0; >> } >> +/* >> + * swapin readahead based batching interface for zswap batched loads using IAA: >> + * >> + * Should only be called for and if the faulting swap entry in do_swap_page >> + * is single-mapped and SWP_SYNCHRONOUS_IO. >> + * >> + * Detect if the folio is in the swapcache, is still mapped to only this >> + * process, and further, there are no additional references to this folio >> + * (for e.g. if another process simultaneously readahead this swap entry >> + * while this process was handling the page-fault, and got a pointer to the >> + * folio allocated by this process in the swapcache), besides the references >> + * that were obtained within __read_swap_cache_async() by this process that is >> + * faulting in this single-mapped swap entry. >> + */ > > How is this supposed to work for large folios? > Hi, I was looking at zswapin large folio support and have posted a RFC in [1]. I got bogged down with some prod stuff, so wasn't able to send it earlier. It looks quite different, and I think simpler from this series, so might be a good comparison. [1] https://lore.kernel.org/all/20241018105026.2521366-1-usamaarif642@xxxxxxxxx/ Thanks, Usama