Re: [PATCH v2 3/3] nbd: fix race between nbd_alloc_config() and module removal

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

On 9/6/2021 5:30 PM, Christoph Hellwig wrote:
> On Sat, Sep 04, 2021 at 08:25:19PM +0800, Hou Tao wrote:
>> When nbd module is being removing, nbd_alloc_config() may be
>> called concurrently by nbd_genl_connect(), although try_module_get()
>> will return false, but nbd_alloc_config() doesn't handle it.
>>
>> The race may lead to the leak of nbd_config and its related
>> resources (e.g, recv_workq) and oops in nbd_read_stat() due
>> to the unload of nbd module as shown below:
>>
>>   BUG: kernel NULL pointer dereference, address: 0000000000000040
>>   Oops: 0000 [#1] SMP PTI
>>   CPU: 5 PID: 13840 Comm: kworker/u17:33 Not tainted 5.14.0+ #1
>>   Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
>>   Workqueue: knbd16-recv recv_work [nbd]
>>   RIP: 0010:nbd_read_stat.cold+0x130/0x1a4 [nbd]
>>   Call Trace:
>>    recv_work+0x3b/0xb0 [nbd]
>>    process_one_work+0x1ed/0x390
>>    worker_thread+0x4a/0x3d0
>>    kthread+0x12a/0x150
>>    ret_from_fork+0x22/0x30
>>
>> Fixing it by checking the return value of try_module_get()
>> in nbd_alloc_config(). As nbd_alloc_config() may return ERR_PTR(-ENODEV),
>> assign nbd->config only when nbd_alloc_config() succeeds to ensure
>> the value of nbd->config is binary (valid or NULL).
>>
>> Also adding a debug message to check the reference counter
>> of nbd_config during module removal.
>>
>> Signed-off-by: Hou Tao <houtao1@xxxxxxxxxx>
>> ---
>>  drivers/block/nbd.c | 28 +++++++++++++++++++---------
>>  1 file changed, 19 insertions(+), 9 deletions(-)
>>
>> diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
>> index cedd3648e1a7..fa6c069b79dc 100644
>> --- a/drivers/block/nbd.c
>> +++ b/drivers/block/nbd.c
>> @@ -1473,15 +1473,20 @@ static struct nbd_config *nbd_alloc_config(void)
>>  {
>>  	struct nbd_config *config;
>>  
>> +	if (!try_module_get(THIS_MODULE))
>> +		return ERR_PTR(-ENODEV);
> try_module_get(THIS_MODULE) is an indicator for an unsafe pattern.  If
> we don't already have a reference it could never close the race.
>
> Looking at the callers:
>
>  - nbd_open like all block device operations must have a reference
>    already.
Yes. nbd_open() has already taken a reference in dentry_open().
>  - for nbd_genl_connect I'm not an expert, but given that struct
>    nbd_genl_family has a module member I suspect the networkinh
>    code already takes a reference.

That was my original though, but the fact is netlink code doesn't take a module reference

in genl_family_rcv_msg_doit() and netlink uses genl_lock_all() to serialize between module removal

and nbd_connect_genl_ops calling, so I think use try_module_get() is OK here.

Regards,

Tao


> So this should be able to use __module_get.

> .



[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux