We have been seeing kernel panics on certain systems with the 4.9.x stable kernel, where x >= 98. We captured the following panic message: [ 6.252000] BUG: unable to handle kernel NULL pointer dereference at 0000000000000102 [ 6.261222] IP: [<ffffffff810a8ea2>] __queue_work+0x32/0x430 [ 6.267826] PGD 0 [ 6.269951] [ 6.271887] Oops: 0000 [#1] SMP [ 6.275576] Modules linked in: [ 6.279264] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.9.100 #1 [ 6.296760] task: ffffffff81c114c0 task.stack: ffffffff81c00000 [ 6.303564] RIP: 0010:[<ffffffff810a8ea2>] [<ffffffff810a8ea2>] __queue_work+0x32/0x430 [ 6.312997] RSP: 0000:ffff897a40403d98 EFLAGS: 00010046 [ 6.319120] RAX: 0000000000000082 RBX: 0000000000000046 RCX: 0000000000000000 [ 6.327282] RDX: ffffffff81d02300 RSI: 0000000000000000 RDI: 0000000000002000 [ 6.335443] RBP: ffff897a40403dd0 R08: 00000000d431dd4b R09: 0000000000000000 [ 6.343606] R10: ffff897a40403e08 R11: ffffffff8227615c R12: ffffffff81d02300 [ 6.351766] R13: 0000000000002000 R14: 0000000000000000 R15: ffffffff81a88ffd [ 6.359930] FS: 0000000000000000(0000) GS:ffff897a40400000(0000) knlGS:0000000000000000 [ 6.369274] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 6.375883] CR2: 0000000000000102 CR3: 0000000001c08000 CR4: 0000000000040630 [ 6.384045] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 6.392206] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 6.400366] Stack: [ 6.402801] 0000001000000000 0000200040403e08 0000000000000046 0000000000000002 [ 6.411723] ffffffff82275f40 ffff897a40403e08 ffffffff81a88ffd ffff897a40403de8 [ 6.420638] ffffffff810a9677 ffffffff82275f88 ffff897a40403e78 ffffffff8148dff4 [ 6.429554] Call Trace: [ 6.432473] <IRQ> [ 6.434698] [<ffffffff810a9677>] queue_work_on+0x27/0x40 [ 6.441121] [<ffffffff8148dff4>] crng_reseed+0x154/0x260 [ 6.447343] [<ffffffff8148e334>] credit_entropy_bits+0x234/0x2a0 [ 6.454343] [<ffffffff8148e77b>] ? add_interrupt_randomness+0x1bb/0x220 [ 6.462020] [<ffffffff8148e77b>] add_interrupt_randomness+0x1bb/0x220 [ 6.469506] [<ffffffff810ef8f0>] handle_irq_event_percpu+0x40/0x80 [ 6.476698] [<ffffffff810ef96b>] handle_irq_event+0x3b/0x60 [ 6.483211] [<ffffffff810f2e7f>] handle_level_irq+0x8f/0x110 [ 6.489823] [<ffffffff81030fd5>] handle_irq+0xb5/0x140 [ 6.495854] [<ffffffff81096651>] ? _local_bh_enable+0x21/0x50 [ 6.502562] [<ffffffff81038ab5>] ? __exit_idle+0x5/0x30 [ 6.508689] [<ffffffff817250cd>] do_IRQ+0x4d/0xe0 [ 6.514232] [<ffffffff81722ba0>] common_interrupt+0xa0/0xa0 [ 6.520744] <EOI> [ 6.522968] [<ffffffff81ddeb08>] ? vfs_caches_init+0xe1/0xe3 [ 6.529779] [<ffffffff81da212e>] start_kernel+0x451/0x4c8 [ 6.536096] [<ffffffff81da1acb>] ? set_init_arg+0x55/0x55 [ 6.542415] [<ffffffff81da1120>] ? early_idt_handler_array+0x120/0x120 [ 6.549995] [<ffffffff81da15d6>] x86_64_start_reservations+0x2a/0x2c [ 6.557381] [<ffffffff81da1714>] x86_64_start_kernel+0x13c/0x15f [ 6.564378] Code: 89 e5 41 57 41 56 49 89 f6 41 55 41 89 fd 41 54 49 89 d4 53 48 83 ec 10 89 7d d4 ff 14 25 80 86 c3 81 f6 c4 02 0f 85 1f 03 00 00 <41> f6 86 02 01 00 00 01 0f 85 b2 02 00 00 49 c7 c7 78 69 01 00 [ 6.591500] RIP [<ffffffff810a8ea2>] __queue_work+0x32/0x430 [ 6.598197] RSP <ffff897a40403d98> [ 6.602281] CR2: 0000000000000102 The NULL pointer dereference happens because the second argument to __queue_work is 0. The second argument is a struct workqueue_struct, specifically system_wq. This shows that an interrupt happened before the system_wq was initialized. We believe the problem was introduced with this commit: $ git show c3ff2da5cef05676d490fa9057b2dceb5e48cdb9 commit c3ff2da5cef05676d490fa9057b2dceb5e48cdb9 Author: Theodore Ts'o <tytso@xxxxxxx> Date: Mon Apr 23 18:51:28 2018 -0400 random: fix possible sleeping allocation from irq context commit 6c1e851c4edc13a43adb3ea4044e3fc8f43ccf7d upstream. We can do a sleeping allocation from an irq context when CONFIG_NUMA is enabled. Fix this by initializing the NUMA crng instances in a workqueue. Reported-by: Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx> Reported-by: syzbot+9de458f6a5e713ee8c1a@xxxxxxxxxxxxxxxxxxxxxxxxx Fixes: 8ef35c866f8862df ("random: set up the NUMA crng instances...") Cc: stable@xxxxxxxxxxxxxxx Signed-off-by: Theodore Ts'o <tytso@xxxxxxx> Signed-off-by: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx> diff --git a/drivers/char/random.c b/drivers/char/random.c index b979173..dbfb3e69 100644 --- a/drivers/char/random.c +++ b/drivers/char/random.c @@ -820,7 +820,7 @@ static int crng_fast_load(const char *cp, size_t len) } #ifdef CONFIG_NUMA -static void numa_crng_init(void) +static void do_numa_crng_init(struct work_struct *work) { int i; struct crng_state *crng; @@ -841,6 +841,13 @@ static void numa_crng_init(void) kfree(pool); } } + +static DECLARE_WORK(numa_crng_init_work, do_numa_crng_init); + +static void numa_crng_init(void) +{ + schedule_work(&numa_crng_init_work); +} #else static void numa_crng_init(void) {} #endif Apparently we can't count on system_wq being initialized when schedule_work is called in numa_crng_init from an early interrupt. I don't understand the underlying code enough to propose a fix right away. thanks, Max