> 2023年10月20日 15:50,Mingzhe Zou <mingzhe.zou@xxxxxxxxxxxx> 写道: > > We get a kernel crash about "unable to handle kernel paging request": > > ```dmesg > [368033.032005] BUG: unable to handle kernel paging request at ffffffffad9ae4b5 > [368033.032007] PGD fc3a0d067 P4D fc3a0d067 PUD fc3a0e063 PMD 8000000fc38000e1 > [368033.032012] Oops: 0003 [#1] SMP PTI > [368033.032015] CPU: 23 PID: 55090 Comm: bch_dirtcnt[0] Kdump: loaded Tainted: G OE --------- - - 4.18.0-147.5.1.es8_24.x86_64 #1 > [368033.032017] Hardware name: Tsinghua Tongfang THTF Chaoqiang Server/072T6D, BIOS 2.4.3 01/17/2017 > [368033.032027] RIP: 0010:native_queued_spin_lock_slowpath+0x183/0x1d0 > [368033.032029] Code: 8b 02 48 85 c0 74 f6 48 89 c1 eb d0 c1 e9 12 83 e0 > 03 83 e9 01 48 c1 e0 05 48 63 c9 48 05 c0 3d 02 00 48 03 04 cd 60 68 93 > ad <48> 89 10 8b 42 08 85 c0 75 09 f3 90 8b 42 08 85 c0 74 f7 48 8b 02 > [368033.032031] RSP: 0018:ffffbb48852abe00 EFLAGS: 00010082 > [368033.032032] RAX: ffffffffad9ae4b5 RBX: 0000000000000246 RCX: 0000000000003bf3 > [368033.032033] RDX: ffff97b0ff8e3dc0 RSI: 0000000000600000 RDI: ffffbb4884743c68 > [368033.032034] RBP: 0000000000000001 R08: 0000000000000000 R09: 000007ffffffffff > [368033.032035] R10: ffffbb486bb01000 R11: 0000000000000001 R12: ffffffffc068da70 > [368033.032036] R13: 0000000000000003 R14: 0000000000000000 R15: 0000000000000000 > [368033.032038] FS: 0000000000000000(0000) GS:ffff97b0ff8c0000(0000) knlGS:0000000000000000 > [368033.032039] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [368033.032040] CR2: ffffffffad9ae4b5 CR3: 0000000fc3a0a002 CR4: 00000000003626e0 > [368033.032042] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > [368033.032043] bcache: bch_cached_dev_attach() Caching rbd479 as bcache462 on set 8cff3c36-4a76-4242-afaa-7630206bc70b > [368033.032045] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > [368033.032046] Call Trace: > [368033.032054] _raw_spin_lock_irqsave+0x32/0x40 > [368033.032061] __wake_up_common_lock+0x63/0xc0 > [368033.032073] ? bch_ptr_invalid+0x10/0x10 [bcache] > [368033.033502] bch_dirty_init_thread+0x14c/0x160 [bcache] > [368033.033511] ? read_dirty_submit+0x60/0x60 [bcache] > [368033.033516] kthread+0x112/0x130 > [368033.033520] ? kthread_flush_work_fn+0x10/0x10 > [368033.034505] ret_from_fork+0x35/0x40 > ``` > > The crash occurred when call wake_up(&state->wait), and then we want > to look at the value in the state. However, bch_sectors_dirty_init() > is not found in the stack of any task. Since state is allocated on > the stack, we guess that bch_sectors_dirty_init() has exited, causing > bch_dirty_init_thread() to be unable to handle kernel paging request. > > In order to verify this idea, we added some printing information during > wake_up(&state->wait). We find that "wake up" is printed twice, however > we only expect the last thread to wake up once. > > ```dmesg > [ 994.641004] alcache: bch_dirty_init_thread() wake up > [ 994.641018] alcache: bch_dirty_init_thread() wake up > [ 994.641523] alcache: bch_sectors_dirty_init() init exit > ``` > > There is a race. If bch_sectors_dirty_init() exits after the first wake > up, the second wake up will trigger this bug("unable to handle kernel > paging request"). > > Proceed as follows: > > bch_sectors_dirty_init > kthread_run ==============> bch_dirty_init_thread(bch_dirtcnt[0]) > ... ... > atomic_inc(&state.started) ... > ... ... > atomic_read(&state.enough) ... > ... atomic_set(&state->enough, 1) > kthread_run ======================================================> bch_dirty_init_thread(bch_dirtcnt[1]) > ... atomic_dec_and_test(&state->started) ... > atomic_inc(&state.started) ... ... > ... wake_up(&state->wait) ... > atomic_read(&state.enough) atomic_dec_and_test(&state->started) > ... ... > wait_event(state.wait, atomic_read(&state.started) == 0) ... > return ... > wake_up(&state->wait) > > We believe it is very common to wake up twice if there is no dirty, but > crash is an extremely low probability event. It's hard for us to reproduce > this issue. We attached and detached continuously for a week, with a total > of more than one million attaches and only one crash. > > Putting atomic_inc(&state.started) before kthread_run() can avoid waking > up twice. > > Fixes: b144e45fc576 ("bcache: make bch_sectors_dirty_init() to be multithreaded") > Signed-off-by: Mingzhe Zou <mingzhe.zou@xxxxxxxxxxxx> > Cc: stable@xxxxxxxxxxxxxxx Thanks for catching this. Added to my for-next. Coly Li > --- > drivers/md/bcache/writeback.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c > index 24c049067f61..a6ddd0bb9220 100644 > --- a/drivers/md/bcache/writeback.c > +++ b/drivers/md/bcache/writeback.c > @@ -1014,17 +1014,18 @@ void bch_sectors_dirty_init(struct bcache_device *d) > if (atomic_read(&state.enough)) > break; > > + atomic_inc(&state.started); > state.infos[i].state = &state; > state.infos[i].thread = > kthread_run(bch_dirty_init_thread, &state.infos[i], > "bch_dirtcnt[%d]", i); > if (IS_ERR(state.infos[i].thread)) { > pr_err("fails to run thread bch_dirty_init[%d]\n", i); > + atomic_dec(&state.started); > for (--i; i >= 0; i--) > kthread_stop(state.infos[i].thread); > goto out; > } > - atomic_inc(&state.started); > } > > out: > -- > 2.17.1.windows.2 >