On Wed, Jun 06, 2018 at 11:26:46PM +0000, Max Asbock wrote: > > We have been seeing kernel panics on certain systems with the 4.9.x stable kernel, where x >= 98. > We captured the following panic message: > [ 6.252000] BUG: unable to handle kernel NULL pointer dereference at 0000000000000102 > [ 6.261222] IP: [<ffffffff810a8ea2>] __queue_work+0x32/0x430 > [ 6.267826] PGD 0 [ 6.269951] > [ 6.271887] Oops: 0000 [#1] SMP > [ 6.275576] Modules linked in: > [ 6.279264] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.9.100 #1 > [ 6.296760] task: ffffffff81c114c0 task.stack: ffffffff81c00000 > [ 6.303564] RIP: 0010:[<ffffffff810a8ea2>] [<ffffffff810a8ea2>] __queue_work+0x32/0x430 > [ 6.312997] RSP: 0000:ffff897a40403d98 EFLAGS: 00010046 > [ 6.319120] RAX: 0000000000000082 RBX: 0000000000000046 RCX: 0000000000000000 > [ 6.327282] RDX: ffffffff81d02300 RSI: 0000000000000000 RDI: 0000000000002000 > [ 6.335443] RBP: ffff897a40403dd0 R08: 00000000d431dd4b R09: 0000000000000000 > [ 6.343606] R10: ffff897a40403e08 R11: ffffffff8227615c R12: ffffffff81d02300 > [ 6.351766] R13: 0000000000002000 R14: 0000000000000000 R15: ffffffff81a88ffd > [ 6.359930] FS: 0000000000000000(0000) GS:ffff897a40400000(0000) knlGS:0000000000000000 > [ 6.369274] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 6.375883] CR2: 0000000000000102 CR3: 0000000001c08000 CR4: 0000000000040630 > [ 6.384045] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > [ 6.392206] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > [ 6.400366] Stack: > [ 6.402801] 0000001000000000 0000200040403e08 0000000000000046 0000000000000002 > [ 6.411723] ffffffff82275f40 ffff897a40403e08 ffffffff81a88ffd ffff897a40403de8 > [ 6.420638] ffffffff810a9677 ffffffff82275f88 ffff897a40403e78 ffffffff8148dff4 > [ 6.429554] Call Trace: > [ 6.432473] <IRQ> [ 6.434698] [<ffffffff810a9677>] queue_work_on+0x27/0x40 > [ 6.441121] [<ffffffff8148dff4>] crng_reseed+0x154/0x260 > [ 6.447343] [<ffffffff8148e334>] credit_entropy_bits+0x234/0x2a0 > [ 6.454343] [<ffffffff8148e77b>] ? add_interrupt_randomness+0x1bb/0x220 > [ 6.462020] [<ffffffff8148e77b>] add_interrupt_randomness+0x1bb/0x220 > [ 6.469506] [<ffffffff810ef8f0>] handle_irq_event_percpu+0x40/0x80 > [ 6.476698] [<ffffffff810ef96b>] handle_irq_event+0x3b/0x60 > [ 6.483211] [<ffffffff810f2e7f>] handle_level_irq+0x8f/0x110 > [ 6.489823] [<ffffffff81030fd5>] handle_irq+0xb5/0x140 > [ 6.495854] [<ffffffff81096651>] ? _local_bh_enable+0x21/0x50 > [ 6.502562] [<ffffffff81038ab5>] ? __exit_idle+0x5/0x30 > [ 6.508689] [<ffffffff817250cd>] do_IRQ+0x4d/0xe0 > [ 6.514232] [<ffffffff81722ba0>] common_interrupt+0xa0/0xa0 > [ 6.520744] <EOI> [ 6.522968] [<ffffffff81ddeb08>] ? vfs_caches_init+0xe1/0xe3 > [ 6.529779] [<ffffffff81da212e>] start_kernel+0x451/0x4c8 > [ 6.536096] [<ffffffff81da1acb>] ? set_init_arg+0x55/0x55 > [ 6.542415] [<ffffffff81da1120>] ? early_idt_handler_array+0x120/0x120 > [ 6.549995] [<ffffffff81da15d6>] x86_64_start_reservations+0x2a/0x2c > [ 6.557381] [<ffffffff81da1714>] x86_64_start_kernel+0x13c/0x15f > [ 6.564378] Code: 89 e5 41 57 41 56 49 89 f6 41 55 41 89 fd 41 54 49 89 d4 53 48 83 ec 10 89 7d d4 ff 14 25 80 86 c3 81 f6 c4 02 0f 85 1f 03 00 00 <41> f6 86 02 01 00 00 01 0f 85 b2 02 00 00 49 c7 c7 78 69 01 00 > [ 6.591500] RIP [<ffffffff810a8ea2>] __queue_work+0x32/0x430 > [ 6.598197] RSP <ffff897a40403d98> > [ 6.602281] CR2: 0000000000000102 > > The NULL pointer dereference happens because the second argument to __queue_work is 0. The second argument is a struct workqueue_struct, specifically system_wq. This shows that an interrupt happened before the system_wq was initialized. We believe the problem was introduced with this commit: > > $ git show c3ff2da5cef05676d490fa9057b2dceb5e48cdb9 > commit c3ff2da5cef05676d490fa9057b2dceb5e48cdb9 > Author: Theodore Ts'o <tytso@xxxxxxx> > Date: Mon Apr 23 18:51:28 2018 -0400 > > random: fix possible sleeping allocation from irq context > > commit 6c1e851c4edc13a43adb3ea4044e3fc8f43ccf7d upstream. > > We can do a sleeping allocation from an irq context when CONFIG_NUMA > is enabled. Fix this by initializing the NUMA crng instances in a > workqueue. > > Reported-by: Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx> > Reported-by: syzbot+9de458f6a5e713ee8c1a@xxxxxxxxxxxxxxxxxxxxxxxxx > Fixes: 8ef35c866f8862df ("random: set up the NUMA crng instances...") > Cc: stable@xxxxxxxxxxxxxxx > Signed-off-by: Theodore Ts'o <tytso@xxxxxxx> > Signed-off-by: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx> > > diff --git a/drivers/char/random.c b/drivers/char/random.c > index b979173..dbfb3e69 100644 > --- a/drivers/char/random.c > +++ b/drivers/char/random.c > @@ -820,7 +820,7 @@ static int crng_fast_load(const char *cp, size_t len) > } > > #ifdef CONFIG_NUMA > -static void numa_crng_init(void) > +static void do_numa_crng_init(struct work_struct *work) > { > int i; > struct crng_state *crng; > @@ -841,6 +841,13 @@ static void numa_crng_init(void) > kfree(pool); > } > } > + > +static DECLARE_WORK(numa_crng_init_work, do_numa_crng_init); > + > +static void numa_crng_init(void) > +{ > + schedule_work(&numa_crng_init_work); > +} > #else > static void numa_crng_init(void) {} > #endif > > Apparently we can't count on system_wq being initialized when schedule_work is called in numa_crng_init from an early interrupt. > > I don't understand the underlying code enough to propose a fix right away. Ick :( I'm guessing you also see these problems on 4.17? Can you test there to be sure of that? thanks, greg k-h