Re: [PATCH] bcache: fixup multi-threaded bch_sectors_dirty_init() wake-up race

Coly Li <colyli@xxxxxxx> · Fri, 20 Oct 2023 17:23:25 +0800

> 2023年10月20日 15:50，Mingzhe Zou <mingzhe.zou@xxxxxxxxxxxx> 写道：
> 
> We get a kernel crash about "unable to handle kernel paging request":
> 
> ```dmesg
> [368033.032005] BUG: unable to handle kernel paging request at ffffffffad9ae4b5
> [368033.032007] PGD fc3a0d067 P4D fc3a0d067 PUD fc3a0e063 PMD 8000000fc38000e1
> [368033.032012] Oops: 0003 [#1] SMP PTI
> [368033.032015] CPU: 23 PID: 55090 Comm: bch_dirtcnt[0] Kdump: loaded Tainted: G           OE    --------- -  - 4.18.0-147.5.1.es8_24.x86_64 #1
> [368033.032017] Hardware name: Tsinghua Tongfang THTF Chaoqiang Server/072T6D, BIOS 2.4.3 01/17/2017
> [368033.032027] RIP: 0010:native_queued_spin_lock_slowpath+0x183/0x1d0
> [368033.032029] Code: 8b 02 48 85 c0 74 f6 48 89 c1 eb d0 c1 e9 12 83 e0
> 03 83 e9 01 48 c1 e0 05 48 63 c9 48 05 c0 3d 02 00 48 03 04 cd 60 68 93
> ad <48> 89 10 8b 42 08 85 c0 75 09 f3 90 8b 42 08 85 c0 74 f7 48 8b 02
> [368033.032031] RSP: 0018:ffffbb48852abe00 EFLAGS: 00010082
> [368033.032032] RAX: ffffffffad9ae4b5 RBX: 0000000000000246 RCX: 0000000000003bf3
> [368033.032033] RDX: ffff97b0ff8e3dc0 RSI: 0000000000600000 RDI: ffffbb4884743c68
> [368033.032034] RBP: 0000000000000001 R08: 0000000000000000 R09: 000007ffffffffff
> [368033.032035] R10: ffffbb486bb01000 R11: 0000000000000001 R12: ffffffffc068da70
> [368033.032036] R13: 0000000000000003 R14: 0000000000000000 R15: 0000000000000000
> [368033.032038] FS:  0000000000000000(0000) GS:ffff97b0ff8c0000(0000) knlGS:0000000000000000
> [368033.032039] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [368033.032040] CR2: ffffffffad9ae4b5 CR3: 0000000fc3a0a002 CR4: 00000000003626e0
> [368033.032042] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [368033.032043] bcache: bch_cached_dev_attach() Caching rbd479 as bcache462 on set 8cff3c36-4a76-4242-afaa-7630206bc70b
> [368033.032045] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [368033.032046] Call Trace:
> [368033.032054]  _raw_spin_lock_irqsave+0x32/0x40
> [368033.032061]  __wake_up_common_lock+0x63/0xc0
> [368033.032073]  ? bch_ptr_invalid+0x10/0x10 [bcache]
> [368033.033502]  bch_dirty_init_thread+0x14c/0x160 [bcache]
> [368033.033511]  ? read_dirty_submit+0x60/0x60 [bcache]
> [368033.033516]  kthread+0x112/0x130
> [368033.033520]  ? kthread_flush_work_fn+0x10/0x10
> [368033.034505]  ret_from_fork+0x35/0x40
> ```
> 
> The crash occurred when call wake_up(&state->wait), and then we want
> to look at the value in the state. However, bch_sectors_dirty_init()
> is not found in the stack of any task. Since state is allocated on
> the stack, we guess that bch_sectors_dirty_init() has exited, causing
> bch_dirty_init_thread() to be unable to handle kernel paging request.
> 
> In order to verify this idea, we added some printing information during
> wake_up(&state->wait). We find that "wake up" is printed twice, however
> we only expect the last thread to wake up once.
> 
> ```dmesg
> [  994.641004] alcache: bch_dirty_init_thread() wake up
> [  994.641018] alcache: bch_dirty_init_thread() wake up
> [  994.641523] alcache: bch_sectors_dirty_init() init exit
> ```
> 
> There is a race. If bch_sectors_dirty_init() exits after the first wake
> up, the second wake up will trigger this bug("unable to handle kernel
> paging request").
> 
> Proceed as follows:
> 
> bch_sectors_dirty_init
>    kthread_run ==============> bch_dirty_init_thread(bch_dirtcnt[0])
>            ...                         ...
>    atomic_inc(&state.started)          ...
>            ...                         ...
>    atomic_read(&state.enough)          ...
>            ...                 atomic_set(&state->enough, 1)
>    kthread_run ======================================================> bch_dirty_init_thread(bch_dirtcnt[1])
>            ...                 atomic_dec_and_test(&state->started)            ...
>    atomic_inc(&state.started)          ...                                     ...
>            ...                 wake_up(&state->wait)                           ...
>    atomic_read(&state.enough)                                          atomic_dec_and_test(&state->started)
>            ...                                                                 ...
>    wait_event(state.wait, atomic_read(&state.started) == 0)                    ...
>    return                                                                      ...
>                                                                        wake_up(&state->wait)
> 
> We believe it is very common to wake up twice if there is no dirty, but
> crash is an extremely low probability event. It's hard for us to reproduce
> this issue. We attached and detached continuously for a week, with a total
> of more than one million attaches and only one crash.
> 
> Putting atomic_inc(&state.started) before kthread_run() can avoid waking
> up twice.
> 
> Fixes: b144e45fc576 ("bcache: make bch_sectors_dirty_init() to be multithreaded")
> Signed-off-by: Mingzhe Zou <mingzhe.zou@xxxxxxxxxxxx>
> Cc: stable@xxxxxxxxxxxxxxx

Thanks for catching this. Added to my for-next.

Coly Li

> ---
> drivers/md/bcache/writeback.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c
> index 24c049067f61..a6ddd0bb9220 100644
> --- a/drivers/md/bcache/writeback.c
> +++ b/drivers/md/bcache/writeback.c
> @@ -1014,17 +1014,18 @@ void bch_sectors_dirty_init(struct bcache_device *d)
> if (atomic_read(&state.enough))
> break;
> 
> + atomic_inc(&state.started);
> state.infos[i].state = &state;
> state.infos[i].thread =
> kthread_run(bch_dirty_init_thread, &state.infos[i],
>    "bch_dirtcnt[%d]", i);
> if (IS_ERR(state.infos[i].thread)) {
> pr_err("fails to run thread bch_dirty_init[%d]\n", i);
> + atomic_dec(&state.started);
> for (--i; i >= 0; i--)
> kthread_stop(state.infos[i].thread);
> goto out;
> }
> - atomic_inc(&state.started);
> }
> 
> out:
> -- 
> 2.17.1.windows.2
>