On Mon 30-05-16 21:28:57, Oleg Nesterov wrote: > On 05/30, Michal Hocko wrote: > > > > Make sure to not select vforked task as an oom victim by checking > > vfork_done in oom_badness. > > I agree, this look like a good change to me... But. > > > --- a/mm/oom_kill.c > > +++ b/mm/oom_kill.c > > @@ -176,11 +176,13 @@ unsigned long oom_badness(struct task_struct *p, struct mem_cgroup *memcg, > > > > /* > > * Do not even consider tasks which are explicitly marked oom > > - * unkillable or have been already oom reaped. > > + * unkillable or have been already oom reaped or the are in > > + * the middle of vfork > > */ > > adj = (long)p->signal->oom_score_adj; > > if (adj == OOM_SCORE_ADJ_MIN || > > - test_bit(MMF_OOM_REAPED, &p->mm->flags)) { > > + test_bit(MMF_OOM_REAPED, &p->mm->flags) || > > + p->vfork_done) { > > I don't think we can trust vfork_done != NULL. > > copy_process() doesn't disallow CLONE_VFORK without CLONE_VM, so with this patch > it would be trivial to make the exploit which hides a memory hog from oom-killer. OK, I wasn't aware of this possibility. It sounds really weird because I thought that the whole point of vfork is to prevent from MM copy overhead for quick exec. > So perhaps we need something like > > bool in_vfork(p) > { > return p->vfork_done && > p->real_parent->mm == mm; > > > } > > task_lock() is not enough if CLONE_VM was used along with CLONE_PARENT... so this > also needs rcu_read_lock() to access ->real_parent. > > Or I am totally confused? I cannot judge I am afraid. You are definitely much more familiar with all these subtle details than me. So what do you think about this follow up: --- >From 9300b4f59b0b108f0013ff44a5da93e8c6b12b46 Mon Sep 17 00:00:00 2001 From: Michal Hocko <mhocko@xxxxxxxx> Date: Tue, 31 May 2016 07:45:45 +0200 Subject: [PATCH 3/3] fold me "mm, oom: skip vforked tasks from being selected" per Oleg: - copy_process() doesn't disallow CLONE_VFORK without CLONE_VM, so with this patch it would be trivial to make the exploit which hides a memory hog from oom-killer. Signed-off-by: Michal Hocko <mhocko@xxxxxxxx> --- include/linux/sched.h | 16 ++++++++++++++++ mm/oom_kill.c | 2 +- 2 files changed, 17 insertions(+), 1 deletion(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index ec636400669f..e1877f4da4cc 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1883,6 +1883,22 @@ extern int arch_task_struct_size __read_mostly; #define TNF_FAULT_LOCAL 0x08 #define TNF_MIGRATE_FAIL 0x10 +/* expects to be called with task_lock held */ +static inline bool in_vfork(struct task_struct *tsk) +{ + bool ret; + + /* + * need RCU to access ->real_parent if CLONE_VM was used along with + * CLONE_PARENT + */ + rcu_read_lock(); + ret = tsk->vfork_done && tsk->real_parent->mm == tsk->mm; + rcu_read_unlock(); + + return ret; +} + #ifdef CONFIG_NUMA_BALANCING extern void task_numa_fault(int last_node, int node, int pages, int flags); extern pid_t task_numa_group_id(struct task_struct *p); diff --git a/mm/oom_kill.c b/mm/oom_kill.c index aa28315ac310..d1e24deb79b9 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -182,7 +182,7 @@ unsigned long oom_badness(struct task_struct *p, struct mem_cgroup *memcg, adj = (long)p->signal->oom_score_adj; if (adj == OOM_SCORE_ADJ_MIN || test_bit(MMF_OOM_REAPED, &p->mm->flags) || - p->vfork_done) { + in_vfork(p)) { task_unlock(p); return 0; } -- 2.8.1 -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>