A concurrency issue about KSM in the function scan_get_next_rmap_item. task A (ksmd): |task B (the mm's task): | mm = slot->mm; | down_read(&mm->mmap_sem); | | ... | | spin_lock(&ksm_mmlist_lock); | | ksm_scan.mm_slot go to the next slot; | | spin_unlock(&ksm_mmlist_lock); | |mmput() -> | ksm_exit(): | |spin_lock(&ksm_mmlist_lock); |if (mm_slot && ksm_scan.mm_slot != mm_slot) { | if (!mm_slot->rmap_list) { | easy_to_free = 1; | ... | |if (easy_to_free) { | mmdrop(mm); | ... | |So this mm_struct will be freed successfully. | up_read(&mm->mmap_sem); | As we can see above, the ksmd thread may access a mm_struct that already been freed to the kmem_cache. Suppose a fork will get this mm_struct from the kmem_cache, the ksmd thread then call up_read(&mm->mmap_sem), will cause mmap_sem.count to become -1. >From the suggestion of Andrea Arcangeli, unmerge_and_remove_all_rmap_items has the same SMP race condition, so fix it too. My prev fix in function scan_get_next_rmap_item will introduce a different SMP race condition, so just invert the up_read/spin_unlock order as Andrea Arcangeli said. Signed-off-by: Zhou Chengming <zhouchengming1@xxxxxxxxxx> Suggested-by: Andrea Arcangeli <aarcange@xxxxxxxxxx> --- mm/ksm.c | 17 ++++++++++------- 1 files changed, 10 insertions(+), 7 deletions(-) diff --git a/mm/ksm.c b/mm/ksm.c index ca6d2a0..d87bafc 100644 --- a/mm/ksm.c +++ b/mm/ksm.c @@ -777,6 +777,7 @@ static int unmerge_and_remove_all_rmap_items(void) } remove_trailing_rmap_items(mm_slot, &mm_slot->rmap_list); + up_read(&mm->mmap_sem); spin_lock(&ksm_mmlist_lock); ksm_scan.mm_slot = list_entry(mm_slot->mm_list.next, @@ -784,16 +785,12 @@ static int unmerge_and_remove_all_rmap_items(void) if (ksm_test_exit(mm)) { hash_del(&mm_slot->link); list_del(&mm_slot->mm_list); - spin_unlock(&ksm_mmlist_lock); free_mm_slot(mm_slot); clear_bit(MMF_VM_MERGEABLE, &mm->flags); - up_read(&mm->mmap_sem); mmdrop(mm); - } else { - spin_unlock(&ksm_mmlist_lock); - up_read(&mm->mmap_sem); } + spin_unlock(&ksm_mmlist_lock); } /* Clean up stable nodes, but don't worry if some are still busy */ @@ -1650,16 +1647,22 @@ next_mm: */ hash_del(&slot->link); list_del(&slot->mm_list); - spin_unlock(&ksm_mmlist_lock); free_mm_slot(slot); clear_bit(MMF_VM_MERGEABLE, &mm->flags); up_read(&mm->mmap_sem); mmdrop(mm); } else { - spin_unlock(&ksm_mmlist_lock); up_read(&mm->mmap_sem); } + /* + * up_read(&mm->mmap_sem) first because after + * spin_unlock(&ksm_mmlist_lock) run, the "mm" may + * already have been freed under us by __ksm_exit() + * because the "mm_slot" is still hashed and + * ksm_scan.mm_slot doesn't point to it anymore. + */ + spin_unlock(&ksm_mmlist_lock); /* Repeat until we've completed scanning the whole list */ slot = ksm_scan.mm_slot; -- 1.7.7 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>