Hi Alex, On Thu, Apr 16, 2020 at 04:01:20PM +0800, Alex Shi wrote: > > > 在 2020/4/15 下午9:42, Alex Shi 写道: > > Hi Johannes, > > > > Thanks a lot for point out! > > > > Charging in __read_swap_cache_async would ask for 3 layers function arguments > > pass, that would be a bit ugly. Compare to this, could we move out the > > lru_cache add after commit_charge, like ksm copied pages? > > > > That give a bit extra non lru list time, but the page just only be used only > > after add_anon_rmap setting. Could it cause troubles? > > Hi Johannes & Andrew, > > Doing lru_cache_add_anon during swapin_readahead can give a very short timing > for possible page reclaiming for these few pages. > > If we delay these few pages lru adding till after the vm_fault target page > get memcg charging(mem_cgroup_commit_charge) and activate, we could skip the > mem_cgroup_try_charge/commit_charge/cancel_charge process in __read_swap_cache_async(). > But the cost is maximum SWAP_RA_ORDER_CEILING number pages on each cpu miss > page reclaiming in a short time. On the other hand, save the target vm_fault > page from reclaiming before activate it during that time. The readahead pages surrounding the faulting page might never get accessed and pile up to large amounts. Users can also trigger non-faulting readahead with MADV_WILLNEED. So unfortunately, I don't see a way to keep these pages off the LRU. They do need to be reclaimable, or they become a DoS vector. I'm currently preparing a small patch series to make swap ownership tracking an integral part of memcg and change the swapin charging sequence, then you don't have to worry about it. This will also unblock Joonsoo's "workingset protection/detection on the anonymous LRU list" patch series, since he is blocked on the same problem - he needs the correct LRU available at swapin time to process refaults correctly. Both of your patch series are already pretty large, they shouldn't need to also deal with that.