On Tue, Jan 9, 2024 at 8:30 AM Yosry Ahmed <yosryahmed@xxxxxxxxxx> wrote: > > On Mon, Jan 8, 2024 at 7:13 PM Zhongkun He <hezhongkun.hzk@xxxxxxxxxxxxx> wrote: > > > > Hi Yosry, glad to hear from you and happy new year! > > > > > Sorry for being late to the party. It seems to me that all of this > > > hassle can be avoided if lru_add_fn() did the right thing in this case > > > and added the folio to the tail of the lru directly. I am no expert in > > > how the page flags work here, but it seems like we can do something > > > like this in lru_add_fn(): > > > > > > if (folio_test_reclaim(folio)) > > > lruvec_add_folio_tail(lruvec, folio); > > > else > > > lruvec_add_folio(lruvec, folio); > > > > > > I think the main problem with this is that PG_reclaim is an alias to > > > PG_readahead, so readahead pages will also go to the tail of the lru, > > > which is probably not good. This sounds dangerous. This is going to introduce a rather large unexpected side effect - we're changing the readahead behavior in a seemingly small zswap optimization. In fact, I'd argue that if we do this, the readahead behavior change will be the "main effect", and the zswap-side change would be a "happy consequence". We should run a lot of benchmarking and document the change extensively if we pursue this route. > > > > Agree with you, I will try it. > > +Matthew Wilcox > > I think we need to figure out if it's okay to do this first, because > it will affect pages with PG_readahead as well. > > > > > > > > > A more intrusive alternative is to introduce a folio_lru_add_tail() > > > variant that always adds pages to the tail, and optionally call that > > > from __read_swap_cache_async() instead of folio_lru_add() based on a > > > new boolean argument. The zswap code can set that boolean argument > > > during writeback to make sure newly allocated folios are always added > > > to the tail of the lru. Unless some page flag/readahead expert can confirm that the first option is safe, my vote is on this option. I mean, it's fairly minimal codewise, no? Just a bunch of plumbing. We can also keep the other call sites intact if we just rename the old versions - something along the line of: __read_swap_cache_async_head(..., bool add_to_lru_head) { ... if (add_to_lru_head) folio_add_lru(folio) else folio_add_lru_tail(folio); } __read_swap_cache_async(...) { return __read_swap_cache_async_tail(..., true); } A bit boilerplate? Sure. But this seems safer, and I doubt it's *that* much more work. > > > > I have the same idea and also find it intrusive. I think the first solution > > is very good and I will try it. If it works, I will send the next version. > > One way to avoid introducing folio_lru_add_tail() and blumping a > boolean from zswap is to have a per-task context (similar to > memalloc_nofs_save()), that makes folio_add_lru() automatically add > folios to the tail of the LRU. I am not sure if this is an acceptable > approach though in terms of per-task flags and such. This seems a bit hacky and obscure, but maybe it could work.