Re: [PATCH] c/r: do not hold mmap_sem while checkpointing vma's

"Serge E. Hallyn" <serue@xxxxxxxxxx> · Mon, 26 Oct 2009 12:24:23 -0500

Quoting Oren Laadan (orenl@xxxxxxxxxxx):
> This patch modifies the memory checkpoint code to _not_ hold the
> mmap_sem while dumping out the vma's.
> 
> The problem with holding the mmap_sem is that it first takes the
> mmap_sem and then takes the file's inode semaphore. This violates the
> normal locking order, e,g, when taking a page fault during a copyout,
> which is inode sem and then the mmap_sem.
> 
> Normally this reverse locking order won't cause a lockup because a the
> output file for the checkpoint image isn't used by the checkpointee.
> However, there a couple of cases where it may be a problem, e.g. when
> some async-IO happens to complete and triggers a page fault at the
> wrong time.
> 
> This fixes complaints from the lockdep about this reverse ordering.
> 
> Signed-off-by: Oren Laadan <orenl@xxxxxxxxxxxxxxx>
> ---
>  checkpoint/memory.c |  133 ++++++++++++++++++++++++++++++++++++---------------
>  1 files changed, 94 insertions(+), 39 deletions(-)
...
> @@ -1288,9 +1343,9 @@ static struct mm_struct *do_restore_mm(struct ckpt_ctx *ctx)
>  		}
>  		set_mm_exe_file(mm, file);
>  	}
> +	up_write(&mm->mmap_sem);
> 
>  	ret = _ckpt_read_buffer(ctx, mm->saved_auxv, sizeof(mm->saved_auxv));
> -	up_write(&mm->mmap_sem);
>  	if (ret < 0)
>  		goto out;

Could there be a race here?  (If only with someone reading /proc/PID/auxv
while this is happening, though maybe with another task sharing the mm at
restart)  I wonder whether you should read into a tmpbuf without mm->mmap_sem,
then re-acquire and write into mm->saved_auxv?

-serge
_______________________________________________
Containers mailing list
Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx
https://lists.linux-foundation.org/mailman/listinfo/containers