I am still reading through the series. It is a lot of code and quite hard to wrap ones head around so these are mostly random things I have run into. More will likely follow up. On Tue 04-01-22 13:22:25, Yu Zhao wrote: [...] > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h > index aba18cd101db..028afdb81c10 100644 > --- a/include/linux/memcontrol.h > +++ b/include/linux/memcontrol.h > @@ -1393,18 +1393,24 @@ mem_cgroup_print_oom_meminfo(struct mem_cgroup *memcg) > > static inline void lock_page_memcg(struct page *page) > { > + /* to match folio_memcg_rcu() */ > + rcu_read_lock(); > } > > static inline void unlock_page_memcg(struct page *page) > { > + rcu_read_unlock(); > } > > static inline void folio_memcg_lock(struct folio *folio) > { > + /* to match folio_memcg_rcu() */ > + rcu_read_lock(); > } > > static inline void folio_memcg_unlock(struct folio *folio) > { > + rcu_read_unlock(); > } This should go into a separate patch and merge it independently. I haven't really realized that !MEMCG configuration has a different locking scopes. [...] > diff --git a/include/linux/oom.h b/include/linux/oom.h > index 2db9a1432511..9c7a4fae0661 100644 > --- a/include/linux/oom.h > +++ b/include/linux/oom.h > @@ -57,6 +57,22 @@ struct oom_control { > extern struct mutex oom_lock; > extern struct mutex oom_adj_mutex; > > +#ifdef CONFIG_MMU > +extern struct task_struct *oom_reaper_list; > +extern struct wait_queue_head oom_reaper_wait; > + > +static inline bool oom_reaping_in_progress(void) > +{ > + /* a racy check can be used to reduce the chance of overkilling */ > + return READ_ONCE(oom_reaper_list) || !waitqueue_active(&oom_reaper_wait); > +} > +#else > +static inline bool oom_reaping_in_progress(void) > +{ > + return false; > +} > +#endif I do not like this. These are internal oom reaper's and no code should really make any decisions based on that. oom_reaping_in_progress is not telling much anyway. This is a global queue for oom reaper that can contain oom victims from different oom scopes (e.g. global OOM, memcg OOM or memory policy OOM). Your lru_gen_age_node uses this to decide whether to trigger out_of_memory and that is clearly wrong for the above reasons. out_of_memory is designed to skip over any action if there is an oom victim pending from the oom domain (have a look at oom_evaluate_task). [...] > +static bool age_lruvec(struct lruvec *lruvec, struct scan_control *sc, > + unsigned long min_ttl) > +{ > + bool need_aging; > + long nr_to_scan; > + struct mem_cgroup *memcg = lruvec_memcg(lruvec); > + int swappiness = get_swappiness(memcg); > + DEFINE_MAX_SEQ(lruvec); > + DEFINE_MIN_SEQ(lruvec); > + > + if (mem_cgroup_below_min(memcg)) > + return false; mem_cgroup_below_min requires effective values to be calculated for the reclaimed hierarchy. Have a look at mem_cgroup_calculate_protection -- Michal Hocko SUSE Labs