On Fri 20-05-16 22:41:27, Tetsuo Handa wrote: > Michal Hocko wrote: > > On Fri 20-05-16 20:51:56, Tetsuo Handa wrote: > > [...] > > > +static bool has_pending_victim(struct task_struct *p) > > > +{ > > > + struct task_struct *t; > > > + bool ret = false; > > > + > > > + rcu_read_lock(); > > > + for_each_thread(p, t) { > > > + if (test_tsk_thread_flag(t, TIF_MEMDIE)) { > > > + ret = true; > > > + break; > > > + } > > > + } > > > + rcu_read_unlock(); > > > + return ret; > > > +} > > > > And so you do not speed up anything in the end because you have to > > iterate all threads anyway yet you add quite some code on top. No I do > > not like it. This is no longer a cleanup... > > I changed for_each_process_thread() to for_each_process(). This means > O(num_threads^2) task_in_mem_cgroup() oom_unkillable_task is called with NULL memcg so we do not call task_in_mem_cgroup. > and O(num_threads^2) > has_intersects_mems_allowed() are replaced with O(num_threads) I am really confused why has_intersects_mems_allowed has to iterate all threads. Do we really allow different mempolicies for threads in the same thread group? > task_in_mem_cgroup() and O(num_threads) has_intersects_mems_allowed() > at the cost of adding O(num_threads) has_pending_victim(). > > I expect that O(num_threads) (task_in_mem_cgroup() + has_intersects_mems_allowed() + > has_pending_victim()) is faster than O(num_threads^2) (task_in_mem_cgroup() + > has_intersects_mems_allowed()) + O(num_threads) test_tsk_thread_flag(). I thought the whole point of the cleanup was to get rid of O(num_thread) because num_threads >> num_processes in most workloads. It seems that we are still not there because of has_intersects_mems_allowed but we should rather address that than add another O(num_threads) sources. > > [...] > > > Note that "[PATCH v3] mm,oom: speed up select_bad_process() loop." temporarily > > > broke oom_task_origin(task) case, for oom_select_bad_process() might select > > > a task without mm because oom_badness() which checks for mm != NULL will not be > > > called. > > > > How can we have oom_task_origin without mm? The flag is set explicitly > > while doing swapoff resp. writing to ksm. We clear the flag before > > exiting. > > What if oom_task_origin(task) received SIGKILL, but task was unable to run for > very long period (e.g. 30 seconds) due to scheduling priority, and the OOM-reaper > reaped task's mm within a second. Next round of OOM-killer selects the same task > due to oom_task_origin(task) without doing MMF_OOM_REAPED test. Which is actuall the intended behavior. The whole point of oom_task_origin is to prevent from killing somebody because of potentially memory hungry operation (e.g. swapoff) and rather kill the initiator. -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>