I've just experienced this bug with ksmd: [ 55.837551] BUG: unable to handle kernel NULL pointer dereference at 00000000000000e8 [ 55.837598] IP: [<ffffffff810bb9b2>] __lock_acquire+0x62/0x1d70 [ 55.837630] PGD 0 [ 55.837643] Oops: 0000 [#1] SMP [ 55.837663] CPU 2 [ 55.837674] Modules linked in: snd_hda_codec_hdmi snd_hda_codec_conexant rtl8192ce rtl8192c_common rtlwifi mac80211 usbhid hid cfg80211 snd_hda_intel snd_hda_codec psmouse snd_pcm e1000e thinkpad_acpi snd_timer snd_page_alloc snd soundcore nvram [ 55.837816] [ 55.837824] Pid: 33, comm: ksmd Not tainted 3.0.0-rc1+ #289 LENOVO 4286CTO/4286CTO [ 55.837850] RIP: 0010:[<ffffffff810bb9b2>] [<ffffffff810bb9b2>] __lock_acquire+0x62/0x1d70 [ 55.837878] RSP: 0018:ffff88023d3abc50 EFLAGS: 00010046 [ 55.837894] RAX: 0000000000000046 RBX: 00000000000000e8 RCX: 0000000000000001 [ 55.837915] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00000000000000e8 [ 55.837936] RBP: ffff88023d3abd40 R08: 0000000000000002 R09: 0000000000000000 [ 55.837957] R10: 0000000000000001 R11: 0000000000000000 R12: ffff88023d3a3e00 [ 55.837978] R13: 0000000000000000 R14: 0000000000000002 R15: 0000000000000000 [ 55.837999] FS: 0000000000000000(0000) GS:ffff88023e280000(0000) knlGS:0000000000000000 [ 55.838022] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 55.838039] CR2: 00000000000000e8 CR3: 00000000016f5000 CR4: 00000000000406e0 [ 55.838060] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 55.838081] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 55.838102] Process ksmd (pid: 33, threadinfo ffff88023d3aa000, task ffff88023d3a3e00) [ 55.838131] Stack: [ 55.838140] ffff88023d3abce0 0000000000000000 ffffffff81d46810 00000000000012c7 [ 55.838168] 000000000000037c ffff88023d3a3e00 0000000000000001 0000000000000000 [ 55.838338] 0000000000000000 0000000000000000 00000000001ba37c ffffffff81a22000 [ 55.838365] Call Trace: [ 55.838375] [<ffffffff810be55f>] ? mark_held_locks+0x6f/0xa0 [ 55.838394] [<ffffffff814e3360>] ? _raw_spin_unlock_irqrestore+0x40/0x70 [ 55.838416] [<ffffffff810bdc90>] lock_acquire+0x90/0x110 [ 55.838434] [<ffffffff8114c652>] ? ksm_scan_thread+0x132/0xe20 [ 55.838453] [<ffffffff8112df6c>] ? free_percpu+0x9c/0x130 [ 55.838470] [<ffffffff814e1cbc>] down_read+0x4c/0x70 [ 55.838486] [<ffffffff8114c652>] ? ksm_scan_thread+0x132/0xe20 [ 55.838505] [<ffffffff814e33bb>] ? _raw_spin_unlock+0x2b/0x40 [ 55.838523] [<ffffffff8114c652>] ksm_scan_thread+0x132/0xe20 [ 55.838541] [<ffffffff814df822>] ? schedule+0x3b2/0x960 [ 55.838559] [<ffffffff810a5690>] ? wake_up_bit+0x40/0x40 [ 55.838576] [<ffffffff8114c520>] ? run_store+0x310/0x310 [ 55.838593] [<ffffffff810a5186>] kthread+0x96/0xa0 [ 55.838609] [<ffffffff814e5014>] kernel_thread_helper+0x4/0x10 [ 55.838628] [<ffffffff814e3700>] ? retint_restore_args+0xe/0xe [ 55.838647] [<ffffffff810a50f0>] ? __init_kthread_worker+0x70/0x70 [ 55.838666] [<ffffffff814e5010>] ? gs_change+0xb/0xb [ 55.838681] Code: b7 00 00 48 89 fb 85 c0 41 89 f5 45 0f 45 f0 8b 05 84 de 68 00 85 c0 0f 84 7b 09 00 00 8b 05 7a 49 7a 00 85 c0 0f 84 c6 01 00 00 [ 55.838780] 8b 03 ba 01 00 00 00 48 3d e0 3c 8c 81 44 0f 44 f2 41 83 fd [ 55.838830] RIP [<ffffffff810bb9b2>] __lock_acquire+0x62/0x1d70 [ 55.838850] RSP <ffff88023d3abc50> [ 55.839567] CR2: 00000000000000e8 [ 55.895721] ---[ end trace eea0fa5dfa6846f1 ]--- The bug can be easily reproduced using the following testcase: ======================== #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <sys/mman.h> #define BUFSIZE getpagesize() int main(int argc, char **argv) { void *ptr; if (posix_memalign(&ptr, getpagesize(), BUFSIZE) < 0) { perror("posix_memalign"); exit(1); } if (madvise(ptr, BUFSIZE, MADV_MERGEABLE) < 0) { perror("madvise"); exit(1); } *(char *)NULL = 0; return 0; } ======================== It seems that when a task segfaults mm_slot->mm becomes NULL, but it's still wrongly considered by the ksm scan. Is there a race with __ksm_exit()? Probably the following is not the right way to fix it, but if I apply this the problem disappears. Anyway, I'm posting this information, it can help you to debug the problem better. Signed-off-by: Andrea Righi <andrea@xxxxxxxxxxxxxxx> --- mm/ksm.c | 2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/mm/ksm.c b/mm/ksm.c index d708b3e..f457feb 100644 --- a/mm/ksm.c +++ b/mm/ksm.c @@ -1308,6 +1308,8 @@ next_mm: } mm = slot->mm; + if (unlikely(!mm)) + return NULL; down_read(&mm->mmap_sem); if (ksm_test_exit(mm)) vma = NULL; -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxxx For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>