On Mon, Mar 26, 2018 at 11:37:25AM -0700, Matthew Wilcox wrote: > On Tue, Mar 27, 2018 at 02:20:39AM +0800, Yang Shi wrote: > > +++ b/kernel/sys.c > > @@ -1959,7 +1959,7 @@ static int prctl_set_mm_map(int opt, const void __user *addr, unsigned long data > > return error; > > } > > > > - down_write(&mm->mmap_sem); > > + down_read(&mm->mmap_sem); > > > > /* > > * We don't validate if these members are pointing to > > @@ -1980,10 +1980,13 @@ static int prctl_set_mm_map(int opt, const void __user *addr, unsigned long data > > mm->start_brk = prctl_map.start_brk; > > mm->brk = prctl_map.brk; > > mm->start_stack = prctl_map.start_stack; > > + > > + spin_lock(&mm->arg_lock); > > mm->arg_start = prctl_map.arg_start; > > mm->arg_end = prctl_map.arg_end; > > mm->env_start = prctl_map.env_start; > > mm->env_end = prctl_map.env_end; > > + spin_unlock(&mm->arg_lock); > > > > /* > > * Note this update of @saved_auxv is lockless thus > > I see the argument for the change to a write lock was because of a BUG > validating arg_start and arg_end, but more generally, we are updating these > values, so a write-lock is probably a good idea, and this is a very rare > operation to do, so we don't care about making this more parallel. I would > not make this change (but if other more knowledgable people in this area > disagree with me, I will withdraw my objection to this part). Say we've two syscalls running prctl_set_mm_map in parallel, and imagine one have @start_brk = 20 @brk = 10 and second caller has @start_brk = 30 and @brk = 20. Since now the call is guarded by _read_ the both calls unlocked and due to OO engine it may happen then when both finish we have @start_brk = 30 and @brk = 10. In turn "write" semaphore has been take to have consistent data on exit, either you have [20;10] or [30;20] assigned not something mixed. That said I think using read-lock here would be a bug. Cyrill