Re: [resend][PATCH 4/4] oom: don't ignore rss in nascent mm

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 25 Oct 2010 at 12:29, KOSAKI Motohiro wrote:

hi,

i've got a few comments/questions about the whole approach, see them inline.

> index 0644a15..a85b196 100644
> --- a/fs/compat.c
> +++ b/fs/compat.c
> @@ -1567,8 +1567,10 @@ int compat_do_execve(char * filename,
>  	return retval;
>  
>  out:
> -	if (bprm->mm)
> +	if (bprm->mm) {
> +		set_exec_mm(NULL);
>  		mmput(bprm->mm);
> +	}
>  
>  out_file:
>  	if (bprm->file) {
> diff --git a/fs/exec.c b/fs/exec.c
> index 94dabd2..2395d10 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -347,6 +347,8 @@ int bprm_mm_init(struct linux_binprm *bprm)
>  	if (err)
>  		goto err;
>  
> +	set_exec_mm(mm);
> +
>  	return 0;
>  
>  err:
> @@ -1416,8 +1428,10 @@ int do_execve(const char * filename,
>  	return retval;
>  
>  out:
> -	if (bprm->mm)
> +	if (bprm->mm) {
> +		set_exec_mm(NULL);
>  		mmput (bprm->mm);
> +	}
>  
>  out_file:
>  	if (bprm->file) {

what happens when two (or more) threads in the same process call execve? the
above set_exec_mm calls will race (de_thread doesn't happen until much later
in execve) and overwrite each other's ->in_exec_mm which will still lead to
problems since there will be at most one temporary mm accounted for in the
oom killer.

[update: since i don't seem to have been cc'd on the other patch that
serializes execve, the above point is moot ;)]

worse, even if each temporary mm was tracked separately there'd still be a
race where the oom killer can get triggered with the culprit thread long
gone (and reset ->in_exec_mm) and never to be found, so the oom killer would
find someone else as guilty.

now all this leads me to suggest a simpler solution, at least for the first
problem mentioned above (i don't know what to do with the second one yet as
it seems to be a generic issue with the oom killer, probably it should verify
the oom situation once again after it took the task_list lock).

[update: while the serialized execve solves the first problem, i still think
that my idea is simpler and worth considering, so i leave it here even if for
just documentation purposes ;)]

given that all the oom killer needs from the mm struct is either ->total_pages
(in .35 and before, so be careful with the stable backport) or some ->rss_stat
counters, wouldn't it be much easier to simply transfer the bprm->mm counters
into current->mm for the duration of the execve (say, add them in get_arg_page
and remove them when bprm->mm is mmput in the do_execve failure path, etc)? the
transfer can be either to the existing counters or to new ones (obviously in
the latter case the oom code needs a small change to take the new counters into
account as well).

cheers,

 PaX Team

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxxx  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>



[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]