On Thu 02-06-16 19:45:00, Tetsuo Handa wrote: > Michal Hocko wrote: > > On Wed 01-06-16 23:12:20, Tetsuo Handa wrote: > > > Michal Hocko wrote: > > > > vforked tasks are not really sitting on any memory. They are sharing > > > > the mm with parent until they exec into a new code. Until then it is > > > > just pinning the address space. OOM killer will kill the vforked task > > > > along with its parent but we still can end up selecting vforked task > > > > when the parent wouldn't be selected. E.g. init doing vfork to launch > > > > a task or vforked being a child of oom unkillable task with an updated > > > > oom_score_adj to be killable. > > > > > > > > Make sure to not select vforked task as an oom victim by checking > > > > vfork_done in oom_badness. > > > > > > While vfork()ed task cannot modify userspace memory, can't such task > > > allocate significant amount of kernel memory inside execve() operation > > > (as demonstrated by CVE-2010-4243 64bit_dos.c )? > > > > > > It is possible that killing vfork()ed task releases a lot of memory, > > > isn't it? > > > > I am not familiar with the above CVE but doesn't that allocated memory > > come after flush_old_exec (and so mm_release)? > > That memory is allocated as of copy_strings() in do_execveat_common(). > > An example shown below (based on https://grsecurity.net/~spender/exploits/64bit_dos.c ) > can consume nearly 50% of 2GB RAM while execve() from vfork(). That is, selecting > vfork()ed task as an OOM victim might release nearly 50% of 2GB RAM. > > ---------- > #include <stdio.h> > #include <stdlib.h> > #include <string.h> > #include <unistd.h> > > #define NUM_ARGS 8000 /* Nearly 50% of 2GB RAM */ > > int main(void) > { > /* Be sure to do "ulimit -s unlimited" before run. */ > char **args; > char *str; > int i; > str = malloc(128 * 1024); > memset(str, ' ', 128 * 1024 - 1); > str[128 * 1024 - 1] = '\0'; > args = malloc(NUM_ARGS * sizeof(char *)); > for (i = 0; i < (NUM_ARGS - 1); i++) > args[i] = str; > args[i] = NULL; > if (vfork() == 0) { > execve("/bin/true", args, NULL); > _exit(1); > } > return 0; > } OK, but the memory is allocated on behalf of the parent already, right? And the patch doesn't prevent parent from being selected and the vfroked child being killed along the way as sharing the mm with it. So what exactly this patch changes for this test case? What am I missing? -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>