On Tue, 29 Sep 2015, Oleg Nesterov wrote: > "mm->locked_vm += grow" and vm_stat_account() in acct_stack_growth() > are not safe; multiple threads using the same ->mm can do this at the > same time trying to expans different vma's under down_read(mmap_sem). expand > This means that one of the "locked_vm += grow" changes can be lost > and we can miss munlock_vma_pages_all() later. >From the Cc list, I guess you are thinking this might be the fix to the "Bad state page (mlocked)" issues Andrey and Sasha have reported. I've not been able to explain those from the direction in which I was thinking (despite giving it more hours of thought meanwhile), so I am glad you're looking at it from a very different direction, and hope you're right with this. > > Move this code into the caller(s) under mm->page_table_lock. All other > updates to ->locked_vm hold mmap_sem for writing. So it looks like Andrea and I broke this back in v2.6.7: page_table_lock was used here before then, and we thought the anon_vma lock was better. Confession: from that time until today, I thought MAP_GROWSDOWN was one of those flags (say, like MAP_DENYWRITE) which the kernel accepts from userspace but ignores; I thought ia64 was the only architecture on which an mm might contain more than one VM_GROWS* vma (excepting the case where the original gets split; but surely stack would have its anon_vma allocated by then, and shared across the split). It's only this patch of yours that leads me to calc_vm_flag_bits(), and to how Michel brought page_table_lock back here to guard vma_gap. > > Signed-off-by: Oleg Nesterov <oleg@xxxxxxxxxx> Acked-by: Hugh Dickins <hughd@xxxxxxxxxx> with some hesitation. I don't like very much that the preliminary mm->locked_vm + grow check is still done without complete locking, so racing threads could get more locked_vm than they're permitted; but I'm not sure that we care enough to put page_table_lock back over all of that (and security_vm_enough_memory wants to have final say on whether to go ahead); even if it was that way years ago. (And if we did care, shouldn't __vm_enough_memory() be using percpu_counter_compare instead of percpu_counter_read_positive? but that's a digression.) It would be even nicer if we could kill these expand_stack() anomalies once and for all, with down_write of mmap_sem here too. But can't be done without revisiting every architecture's mm/fault.c, which I have no stomach for at this time, and probably you neither. Let's accept that your patch is a significant improvement, and hope that it fixes the "Bad page state (mlocked)". > --- > mm/mmap.c | 12 ++++++++---- > 1 file changed, 8 insertions(+), 4 deletions(-) > > diff --git a/mm/mmap.c b/mm/mmap.c > index 8393580..4efdc37 100644 > --- a/mm/mmap.c > +++ b/mm/mmap.c > @@ -2138,10 +2138,6 @@ static int acct_stack_growth(struct vm_area_struct *vma, unsigned long size, uns > if (security_vm_enough_memory_mm(mm, grow)) > return -ENOMEM; > > - /* Ok, everything looks good - let it rip */ > - if (vma->vm_flags & VM_LOCKED) > - mm->locked_vm += grow; > - vm_stat_account(mm, vma->vm_flags, vma->vm_file, grow); > return 0; > } > > @@ -2202,6 +2198,10 @@ int expand_upwards(struct vm_area_struct *vma, unsigned long address) > * against concurrent vma expansions. > */ > spin_lock(&vma->vm_mm->page_table_lock); > + if (vma->vm_flags & VM_LOCKED) > + vma->vm_mm->locked_vm += grow; > + vm_stat_account(vma->vm_mm, vma->vm_flags, > + vma->vm_file, grow); > anon_vma_interval_tree_pre_update_vma(vma); > vma->vm_end = address; > anon_vma_interval_tree_post_update_vma(vma); > @@ -2273,6 +2273,10 @@ int expand_downwards(struct vm_area_struct *vma, > * against concurrent vma expansions. > */ > spin_lock(&vma->vm_mm->page_table_lock); > + if (vma->vm_flags & VM_LOCKED) > + vma->vm_mm->locked_vm += grow; > + vm_stat_account(vma->vm_mm, vma->vm_flags, > + vma->vm_file, grow); > anon_vma_interval_tree_pre_update_vma(vma); > vma->vm_start = address; > vma->vm_pgoff -= grow; > -- > 2.4.3 > > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>