On Fri, Jan 18, 2019 at 5:58 PM Roman Gushchin <guro@xxxxxx> wrote: > > Hi Shakeel! > > > > > On looking further it seems like the process selected to be oom-killed > > has exited even before reaching read_lock(&tasklist_lock) in > > oom_kill_process(). More specifically the tsk->usage is 1 which is due > > to get_task_struct() in oom_evaluate_task() and the put_task_struct > > within for_each_thread() frees the tsk and for_each_thread() tries to > > access the tsk. The easiest fix is to do get/put across the > > for_each_thread() on the selected task. > > Please, feel free to add > Reviewed-by: Roman Gushchin <guro@xxxxxx> > for this part. > Thanks. > > > > Now the next question is should we continue with the oom-kill as the > > previously selected task has exited? However before adding more > > complexity and heuristics, let's answer why we even look at the > > children of oom-kill selected task? The select_bad_process() has already > > selected the worst process in the system/memcg. Due to race, the > > selected process might not be the worst at the kill time but does that > > matter matter? The userspace can play with oom_score_adj to prefer > > children to be killed before the parent. I looked at the history but it > > seems like this is there before git history. > > I'd totally support you in an attempt to remove this logic, > unless someone has a good example of its usefulness. > > I believe it's a very old hack to select children over parents > in case they have the same oom badness (e.g. share most of the memory). > > Maybe we can prefer older processes in case of equal oom badness, > and it will be enough. > > Thanks! I am thinking of removing the whole logic of selecting children. Shakeel