On Thu, 2 Jun 2011, Chris Wright wrote: > * Andrea Righi (andrea@xxxxxxxxxxxxxxx) wrote: > > mmh.. I can reproduce the bug also with the standard ubuntu (11.04) > > kernel. Could you post your .config? > > Andrea (Righi), can you tell me if this WARN fires? This looks > like a pure race between removing from list and checking list, i.e. > insufficient locking. > > ksm_scan.mm_slot == the only registered mm > > CPU 1 (bug program) CPU 2 (ksmd) > list_empty() is false > lock > ksm_scan.mm_slot > list_del > unlock > slot == &ksm_mm_head (but list is now empty_) > > > diff --git a/mm/ksm.c b/mm/ksm.c > index 942dfc7..ab79a92 100644 > --- a/mm/ksm.c > +++ b/mm/ksm.c > @@ -1301,6 +1301,7 @@ static struct rmap_item *scan_get_next_rmap_item(struct page **page) > slot = list_entry(slot->mm_list.next, struct mm_slot, mm_list); > ksm_scan.mm_slot = slot; > spin_unlock(&ksm_mmlist_lock); > + WARN_ON(slot == &ksm_mm_head); > next_mm: > ksm_scan.address = 0; > ksm_scan.rmap_list = &slot->rmap_list; AndreaR, good find, many thanks for discovering and reporting it. I couldn't look at it until last night, and even then, it was not obvious to me exactly where my assumptions were going wrong. Even now it's unclear what role the SIGSEGV plays, as opposed to an normal exit: I guess it just happens to change the timing enough to make the race dangerous. Your patch was not wrong, but I do prefer a patch that plugs the exact hole; and I needed to understand what was going on - without understanding it, there was a danger we might leak memory instead. AndreaA, I didn't study the patch you posted half an hour ago, since by that time I'd worked it out and was preparing patch below. I think your patch would be for a different bug, hopefully one we don't have, it looks more complicated than we should need for this. ChrisW, yes, your WARN_ON is spot on, matches what I saw exactly. I'll fill in the patch description later, must dash now, probably offline until late tonight. Or if you're satisfied and don't want to wait, you guys fill that in and send off to Linus & Andrew - thanks. [PATCH] ksm: fix easily reproduced NULL pointer dereference Reported-by: Andrea Righi <andrea@xxxxxxxxxxxxxxx> Signed-off-by: Hugh Dickins <hughd@xxxxxxxxxx> Cc: stable@xxxxxxxxxx --- mm/ksm.c | 7 +++++++ 1 file changed, 7 insertions(+) --- 3.0-rc1/mm/ksm.c 2011-05-29 18:42:37.429882601 -0700 +++ linux/mm/ksm.c 2011-06-02 09:55:31.729702490 -0700 @@ -1302,6 +1302,13 @@ static struct rmap_item *scan_get_next_r slot = list_entry(slot->mm_list.next, struct mm_slot, mm_list); ksm_scan.mm_slot = slot; spin_unlock(&ksm_mmlist_lock); + + /* + * Although we tested list_empty() above, a racing __ksm_exit + * of the last mm on the list may have removed it since then. + */ + if (slot == &ksm_mm_head) + return NULL; next_mm: ksm_scan.address = 0; ksm_scan.rmap_list = &slot->rmap_list; -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxxx For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>