On Sat, May 21, 2022 at 03:37:45PM +0800, Yu Kuai wrote: > When nbd module is being removing, nbd_alloc_config() may be > called concurrently by nbd_genl_connect(), although try_module_get() > will return false, but nbd_alloc_config() doesn't handle it. > > The race may lead to the leak of nbd_config and its related > resources (e.g, recv_workq) and oops in nbd_read_stat() due > to the unload of nbd module as shown below: > > BUG: kernel NULL pointer dereference, address: 0000000000000040 > Oops: 0000 [#1] SMP PTI > CPU: 5 PID: 13840 Comm: kworker/u17:33 Not tainted 5.14.0+ #1 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) > Workqueue: knbd16-recv recv_work [nbd] > RIP: 0010:nbd_read_stat.cold+0x130/0x1a4 [nbd] > Call Trace: > recv_work+0x3b/0xb0 [nbd] > process_one_work+0x1ed/0x390 > worker_thread+0x4a/0x3d0 > kthread+0x12a/0x150 > ret_from_fork+0x22/0x30 > > Fixing it by checking the return value of try_module_get() > in nbd_alloc_config(). As nbd_alloc_config() may return ERR_PTR(-ENODEV), > assign nbd->config only when nbd_alloc_config() succeeds to ensure > the value of nbd->config is binary (valid or NULL). > > Also adding a debug message to check the reference counter > of nbd_config during module removal. > > Signed-off-by: Hou Tao <houtao1@xxxxxxxxxx> > Signed-off-by: Yu Kuai <yukuai3@xxxxxxxxxx> Reviewed-by: Josef Bacik <josef@xxxxxxxxxxxxxx> Thanks, Josef