On Thu, Jun 02, 2011 at 06:44:58PM +0200, Andrea Arcangeli wrote: > On Thu, Jun 02, 2011 at 08:36:41AM -0700, Chris Wright wrote: > > * Andrea Arcangeli (aarcange@xxxxxxxxxx) wrote: > > > On Thu, Jun 02, 2011 at 07:31:43AM -0700, Chris Wright wrote: > > > > * CAI Qian (caiqian@xxxxxxxxxx) wrote: > > > > > madvise(0x2210000, 4096, 0xc /* MADV_??? */) = 0 > > > > > --- SIGSEGV (Segmentation fault) @ 0 (0) --- > > > > > > > > Right, that's just what the program is trying to do, segfault. > > > > > > > > > +++ killed by SIGSEGV (core dumped) +++ > > > > > Segmentation fault (core dumped) > > > > > > > > > > Did I miss anything? > > > > > > > > I found it works but not 100% of the time. > > > > > > > > So I just run the bug in a loop. > > > > > > echo 0 >scan_millisecs helps. > > > > BTW, here's my stack trace (I dropped back to 2.6.39 just to see if it > > happened to be recent regression). It looks like mm_slot is off the list: > > > > R10: dead000000200200 R11: dead000000100100 > > Yes it had to be use after free. > > I cooked this patch, still untested but it builds. Will test it soon. Hi Andrea, I just tested this patch, but it doesn't seem to fix the problem, at least not the one I reported. The same bug happens again. Thanks, -Andrea > > === > Subject: ksm: fix __ksm_exit vs ksm scan SMP race > > From: Andrea Arcangeli <aarcange@xxxxxxxxxx> > > If the KSM scan releases the ksm_mmlist_lock after the mm_users already dropped > to zero but before __ksm_exit had a chance runs, both the KSM scan and > __ksm_exit will free the slot. This fixes the SMP race condition by using > test_and_bit_set in __ksm_exit to see if __ksm_exit arrived before the KSM > scan or not. > > Signed-off-by: Andrea Arcangeli <aarcange@xxxxxxxxxx> > --- > > diff --git a/mm/ksm.c b/mm/ksm.c > index d708b3e..47ef4c1 100644 > --- a/mm/ksm.c > +++ b/mm/ksm.c > @@ -645,10 +645,16 @@ static int unmerge_and_remove_all_rmap_items(void) > if (ksm_test_exit(mm)) { > hlist_del(&mm_slot->link); > list_del(&mm_slot->mm_list); > + /* > + * After releasing ksm_mmlist_lock __ksm_exit > + * can run and we already changed mm_slot, so > + * notify it with MMF_VM_MERGEABLE not to free > + * this again. > + */ > + clear_bit(MMF_VM_MERGEABLE, &mm->flags); > spin_unlock(&ksm_mmlist_lock); > > free_mm_slot(mm_slot); > - clear_bit(MMF_VM_MERGEABLE, &mm->flags); > up_read(&mm->mmap_sem); > mmdrop(mm); > } else { > @@ -1377,10 +1383,15 @@ next_mm: > */ > hlist_del(&slot->link); > list_del(&slot->mm_list); > + /* > + * After releasing ksm_mmlist_lock __ksm_exit can run > + * and we already changed mm_slot, so notify it with > + * MMF_VM_MERGEABLE not to free this again. > + */ > + clear_bit(MMF_VM_MERGEABLE, &mm->flags); > spin_unlock(&ksm_mmlist_lock); > > free_mm_slot(slot); > - clear_bit(MMF_VM_MERGEABLE, &mm->flags); > up_read(&mm->mmap_sem); > mmdrop(mm); > } else { > @@ -1463,6 +1474,11 @@ int ksm_madvise(struct vm_area_struct *vma, unsigned long start, > VM_NONLINEAR | VM_MIXEDMAP | VM_SAO)) > return 0; /* just ignore the advice */ > > + /* > + * It should be safe to test_bit instead of > + * test_and_bit_set because the madvise generic caller > + * holds the mmap_sem write mode. > + */ > if (!test_bit(MMF_VM_MERGEABLE, &mm->flags)) { > err = __ksm_enter(mm); > if (err) > @@ -1511,6 +1527,10 @@ int __ksm_enter(struct mm_struct *mm) > list_add_tail(&mm_slot->mm_list, &ksm_scan.mm_slot->mm_list); > spin_unlock(&ksm_mmlist_lock); > > + /* > + * It should be safe to set it outside ksm_mmlist_lock because > + * we hold a mm_user pin on the mm so __ksm_exit can't run. > + */ > set_bit(MMF_VM_MERGEABLE, &mm->flags); > atomic_inc(&mm->mm_count); > > @@ -1538,9 +1558,28 @@ void __ksm_exit(struct mm_struct *mm) > mm_slot = get_mm_slot(mm); > if (mm_slot && ksm_scan.mm_slot != mm_slot) { > if (!mm_slot->rmap_list) { > - hlist_del(&mm_slot->link); > - list_del(&mm_slot->mm_list); > - easy_to_free = 1; > + /* > + * If MMF_VM_MERGEABLE isn't set it was freed > + * by the scan immediately after mm_count > + * reached zero (visible by the scan) but > + * before __ksm_exit() run, so we don't need > + * to do anything here. We don't even need to > + * wait for the KSM scan to release the > + * mmap_sem as it's not working on the mm > + * anymore but it's just releasing it, and it > + * probably already did and dropped its > + * mm_count too (it would however be safe to > + * take mmap_sem here even if MMF_VM_MERGEABLE > + * is already clear, as the actual mm can't be > + * freed until we return and we run mmdrop > + * too, but it's unnecessary). > + */ > + if (test_and_clear_bit(MMF_VM_MERGEABLE, &mm->flags)) { > + hlist_del(&mm_slot->link); > + list_del(&mm_slot->mm_list); > + easy_to_free = 1; > + } else > + mm_slot = NULL; > } else { > list_move(&mm_slot->mm_list, > &ksm_scan.mm_slot->mm_list); > @@ -1550,7 +1589,6 @@ void __ksm_exit(struct mm_struct *mm) > > if (easy_to_free) { > free_mm_slot(mm_slot); > - clear_bit(MMF_VM_MERGEABLE, &mm->flags); > mmdrop(mm); > } else if (mm_slot) { > down_write(&mm->mmap_sem); -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxxx For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>