> > > > > > > > Unless some page flag/readahead expert can confirm that the first > > > > option is safe, my vote is on this option. I mean, it's fairly minimal > > > > codewise, no? Just a bunch of plumbing. We can also keep the other > > > > call sites intact if we just rename the old versions - something along > > > > the line of: > > > > > > > > __read_swap_cache_async_head(..., bool add_to_lru_head) > > > > { > > > > ... > > > > if (add_to_lru_head) > > > > folio_add_lru(folio) > > > > else > > > > folio_add_lru_tail(folio); > > > > } > > > > > > > > __read_swap_cache_async(...) > > > > { > > > > return __read_swap_cache_async_tail(..., true); > > > > } > > > > > > > > A bit boilerplate? Sure. But this seems safer, and I doubt it's *that* > > > > much more work. > > > > > > > > > > Yes, agree. I will try it again. > > > > Look forward to seeing it! Thanks for your patience and for working on this. Please forgive me for adding additional information about this patch. I have finished the opt for introducing a folio_add_lru_tail(), but there are many questions: 1) A new page can be move to LRU only by lru_add_fn, so folio_add_lru_tail could not add pages to LRU for the following code in folio_batch_move_lru(),which is added by Alex Shi for serializing memcg changes in pagevec_lru_move_fn[1]. /* block memcg migration while the folio moves between lru */ if (move_fn != lru_add_fn && !folio_test_clear_lru(folio)) continue; To achieve the goal, we need to add a new function like lru_add_fn which does not have the lru flag and folio_add_lru_tail() + if (move_fn != lru_add_fn && move_fn != lru_move_tail_fn_new && + !folio_test_clear_lru(folio)) 2) __read_swap_cache_async has six parameters, so there is no space to add a new one, add_to_lru_head. So it seems a bit hacky just for a special case for the reasons above. Back to the beginning, lru_add_drain() is the simplest option,which is common below the __read_swap_cache_async(). Please see the function swap_cluster_readahead() and swap_vma_readahead(), of course it has been batched. Or we should leave this problem alone,before we can write back zswap in batches. Thanks again.