Hi Shakeel! > > On looking further it seems like the process selected to be oom-killed > has exited even before reaching read_lock(&tasklist_lock) in > oom_kill_process(). More specifically the tsk->usage is 1 which is due > to get_task_struct() in oom_evaluate_task() and the put_task_struct > within for_each_thread() frees the tsk and for_each_thread() tries to > access the tsk. The easiest fix is to do get/put across the > for_each_thread() on the selected task. Please, feel free to add Reviewed-by: Roman Gushchin <guro@xxxxxx> for this part. > > Now the next question is should we continue with the oom-kill as the > previously selected task has exited? However before adding more > complexity and heuristics, let's answer why we even look at the > children of oom-kill selected task? The select_bad_process() has already > selected the worst process in the system/memcg. Due to race, the > selected process might not be the worst at the kill time but does that > matter matter? The userspace can play with oom_score_adj to prefer > children to be killed before the parent. I looked at the history but it > seems like this is there before git history. I'd totally support you in an attempt to remove this logic, unless someone has a good example of its usefulness. I believe it's a very old hack to select children over parents in case they have the same oom badness (e.g. share most of the memory). Maybe we can prefer older processes in case of equal oom badness, and it will be enough. Thanks!