On Tue, Jan 16, 2024 at 5:40 AM Zhongkun He <hezhongkun.hzk@xxxxxxxxxxxxx> wrote: > > > > > > > > > > > Unless some page flag/readahead expert can confirm that the first > > > > > option is safe, my vote is on this option. I mean, it's fairly minimal > > > > > codewise, no? Just a bunch of plumbing. We can also keep the other > > > > > call sites intact if we just rename the old versions - something along > > > > > the line of: > > > > > > > > > > __read_swap_cache_async_head(..., bool add_to_lru_head) > > > > > { > > > > > ... > > > > > if (add_to_lru_head) > > > > > folio_add_lru(folio) > > > > > else > > > > > folio_add_lru_tail(folio); > > > > > } > > > > > > > > > > __read_swap_cache_async(...) > > > > > { > > > > > return __read_swap_cache_async_tail(..., true); > > > > > } > > > > > > > > > > A bit boilerplate? Sure. But this seems safer, and I doubt it's *that* > > > > > much more work. > > > > > > > > > > > > > Yes, agree. I will try it again. > > > > > > Look forward to seeing it! Thanks for your patience and for working on this. > > Please forgive me for adding additional information about this patch. > > I have finished the opt for introducing a folio_add_lru_tail(), but > there are many > questions: > 1) A new page can be move to LRU only by lru_add_fn, so > folio_add_lru_tail could not add pages to LRU for the following code > in folio_batch_move_lru(),which is added by Alex Shi for > serializing memcg changes in pagevec_lru_move_fn[1]. > > /* block memcg migration while the folio moves between lru */ > if (move_fn != lru_add_fn && !folio_test_clear_lru(folio)) > continue; > To achieve the goal, we need to add a new function like lru_add_fn > which does not have the lru flag and folio_add_lru_tail() > + if (move_fn != lru_add_fn && move_fn != lru_move_tail_fn_new && > + !folio_test_clear_lru(folio)) > > 2) __read_swap_cache_async has six parameters, so there is no space to > add a new one, add_to_lru_head. > > So it seems a bit hacky just for a special case for the reasons above. It's a lot of plumbing for sure. Adding a flag to current task_struct is a less-noisy yet-still-hacky solution. I am not saying we should do it, but it's an option. I am not sure how much task flags we have to spare. > > Back to the beginning, lru_add_drain() is the simplest option,which is common > below the __read_swap_cache_async(). Please see the function > swap_cluster_readahead() > and swap_vma_readahead(), of course it has been batched. > > Or we should leave this problem alone,before we can write back zswap > in batches. Calling lru_add_drain() for every written back page is an overkill imo. If we have writeback batching at some point, it may make more sense then. Adding Michal Hocko was recently complaining [1] about lru_add_drain() being called unnecessarily elsewhere. [1]https://lore.kernel.org/linux-mm/ZaD9BNtXZfY2UtVI@tiehlicka/