The patch titled oom: make oom_score to per-process value has been added to the -mm tree. Its filename is oom-make-oom_score-to-per-process-value.patch Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/SubmitChecklist when testing your code *** See http://userweb.kernel.org/~akpm/stuff/added-to-mm.txt to find out what to do about this The current -mm tree may be found at http://userweb.kernel.org/~akpm/mmotm/ ------------------------------------------------------ Subject: oom: make oom_score to per-process value From: KOSAKI Motohiro <kosaki.motohiro@xxxxxxxxxxxxxx> oom-killer kills a process, not task. Then oom_score should be calculated as per-process too. it makes consistency more and makes speed up select_bad_process(). Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@xxxxxxxxxxxxxx> Cc: Paul Menage <menage@xxxxxxxxxx> Cc: David Rientjes <rientjes@xxxxxxxxxx> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@xxxxxxxxxxxxxx> Cc: Oleg Nesterov <oleg@xxxxxxxxxx> Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> --- Documentation/filesystems/proc.txt | 2 - fs/proc/base.c | 2 - mm/oom_kill.c | 35 ++++++++++++++++++++++----- 3 files changed, 31 insertions(+), 8 deletions(-) diff -puN Documentation/filesystems/proc.txt~oom-make-oom_score-to-per-process-value Documentation/filesystems/proc.txt --- a/Documentation/filesystems/proc.txt~oom-make-oom_score-to-per-process-value +++ a/Documentation/filesystems/proc.txt @@ -1204,7 +1204,7 @@ The following heuristics are then applie * if the task was reniced, its score doubles * superuser or direct hardware access tasks (CAP_SYS_ADMIN, CAP_SYS_RESOURCE or CAP_SYS_RAWIO) have their score divided by 4 - * if oom condition happened in one cpuset and checked task does not belong + * if oom condition happened in one cpuset and checked process does not belong to it, its score is divided by 8 * the resulting score is multiplied by two to the power of oom_adj, i.e. points <<= oom_adj when it is positive and diff -puN fs/proc/base.c~oom-make-oom_score-to-per-process-value fs/proc/base.c --- a/fs/proc/base.c~oom-make-oom_score-to-per-process-value +++ a/fs/proc/base.c @@ -447,7 +447,7 @@ static int proc_oom_score(struct task_st do_posix_clock_monotonic_gettime(&uptime); read_lock(&tasklist_lock); - points = badness(task, uptime.tv_sec); + points = badness(task->group_leader, uptime.tv_sec); read_unlock(&tasklist_lock); return sprintf(buffer, "%lu\n", points); } diff -puN mm/oom_kill.c~oom-make-oom_score-to-per-process-value mm/oom_kill.c --- a/mm/oom_kill.c~oom-make-oom_score-to-per-process-value +++ a/mm/oom_kill.c @@ -34,6 +34,23 @@ int sysctl_oom_dump_tasks; static DEFINE_SPINLOCK(zone_scan_lock); /* #define DEBUG */ +/* + * Is all threads of the target process nodes overlap ours? + */ +static int has_intersects_mems_allowed(struct task_struct *tsk) +{ + struct task_struct *t; + + t = tsk; + do { + if (cpuset_mems_allowed_intersects(current, t)) + return 1; + t = next_thread(t); + } while (t != tsk); + + return 0; +} + /** * badness - calculate a numeric value for how bad this task has been * @p: task struct of which task we should calculate @@ -59,6 +76,9 @@ unsigned long badness(struct task_struct struct mm_struct *mm; struct task_struct *child; int oom_adj = p->signal->oom_adj; + struct task_cputime task_time; + unsigned long utime; + unsigned long stime; if (oom_adj == OOM_DISABLE) return 0; @@ -106,8 +126,11 @@ unsigned long badness(struct task_struct * of seconds. There is no particular reason for this other than * that it turned out to work very well in practice. */ - cpu_time = (cputime_to_jiffies(p->utime) + cputime_to_jiffies(p->stime)) - >> (SHIFT_HZ + 3); + thread_group_cputime(p, &task_time); + utime = cputime_to_jiffies(task_time.utime); + stime = cputime_to_jiffies(task_time.stime); + cpu_time = (utime + stime) >> (SHIFT_HZ + 3); + if (uptime >= p->start_time.tv_sec) run_time = (uptime - p->start_time.tv_sec) >> 10; @@ -148,7 +171,7 @@ unsigned long badness(struct task_struct * because p may have allocated or otherwise mapped memory on * this node before. However it will be less likely. */ - if (!cpuset_mems_allowed_intersects(current, p)) + if (!has_intersects_mems_allowed(p)) points /= 8; /* @@ -204,13 +227,13 @@ static inline enum oom_constraint constr static struct task_struct *select_bad_process(unsigned long *ppoints, struct mem_cgroup *mem) { - struct task_struct *g, *p; + struct task_struct *p; struct task_struct *chosen = NULL; struct timespec uptime; *ppoints = 0; do_posix_clock_monotonic_gettime(&uptime); - do_each_thread(g, p) { + for_each_process(p) { unsigned long points; /* @@ -263,7 +286,7 @@ static struct task_struct *select_bad_pr chosen = p; *ppoints = points; } - } while_each_thread(g, p); + } return chosen; } _ Patches currently in -mm which might be from kosaki.motohiro@xxxxxxxxxxxxxx are origin.patch linux-next.patch readahead-add-blk_run_backing_dev.patch readahead-add-blk_run_backing_dev-fix.patch readahead-add-blk_run_backing_dev-fix-fix-2.patch mm-clean-up-page_remove_rmap.patch mm-show_free_areas-display-slab-pages-in-two-separate-fields.patch mm-oom-analysis-add-per-zone-statistics-to-show_free_areas.patch mm-oom-analysis-add-buffer-cache-information-to-show_free_areas.patch mm-oom-analysis-show-kernel-stack-usage-in-proc-meminfo-and-oom-log-output.patch mm-oom-analysis-add-shmem-vmstat.patch mm-rename-pgmoved-variable-in-shrink_active_list.patch mm-shrink_inactive_list-nr_scan-accounting-fix-fix.patch mm-vmstat-add-isolate-pages.patch mm-vmstat-add-isolate-pages-fix.patch vmscan-throttle-direct-reclaim-when-too-many-pages-are-isolated-already.patch mm-remove-__addsub_zone_page_state.patch mm-count-only-reclaimable-lru-pages-v2.patch vmscan-dont-attempt-to-reclaim-anon-page-in-lumpy-reclaim-when-no-swap-space-is-avilable.patch vmscan-move-clearpageactive-from-move_active_pages-to-shrink_active_list.patch vmscan-kill-unnecessary-page-flag-test.patch vmscan-kill-unnecessary-prefetch.patch mm-perform-non-atomic-test-clear-of-pg_mlocked-on-free.patch tracing-page-allocator-add-trace-events-for-page-allocation-and-page-freeing.patch tracing-page-allocator-add-trace-event-for-page-traffic-related-to-the-buddy-lists.patch mm-drop-unneeded-double-negations.patch mm-introduce-page_lru_base_type.patch mm-introduce-page_lru_base_type-fix.patch mm-return-boolean-from-page_is_file_cache.patch mm-return-boolean-from-page_has_private.patch mm-document-is_page_cache_freeable.patch mm-vmscan-rename-zone_nr_pages-to-zone_lru_nr_pages.patch oom-move-oom_killer_enable-oom_killer_disable-to-where-they-belong.patch mm-do-batched-scans-for-mem_cgroup.patch mm-vmscan-remove-page_queue_congested-comment.patch oom-move-oom_adj-value-from-task_struct-to-signal_struct.patch oom-make-oom_score-to-per-process-value.patch oom-oom_kill-doesnt-kill-vfork-parentor-child.patch oom-fix-oom_adjust_write-input-sanity-check.patch getrusage-fill-ru_maxrss-value.patch getrusage-fill-ru_maxrss-value-update.patch memory-controller-soft-limit-documentation-v9.patch memory-controller-soft-limit-interface-v9.patch memory-controller-soft-limit-organize-cgroups-v9.patch memory-controller-soft-limit-organize-cgroups-v9-fix.patch memory-controller-soft-limit-refactor-reclaim-flags-v9.patch memory-controller-soft-limit-reclaim-on-contention-v9.patch memory-controller-soft-limit-reclaim-on-contention-v9-fix.patch memcg-improve-resource-counter-scalability.patch memcg-improve-resource-counter-scalability-v5.patch fs-symlink-write_begin-allocation-context-fix-reiser4-fix.patch -- To unsubscribe from this list: send the line "unsubscribe mm-commits" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html