Re: [PATCH V1] block: Fix null pointer dereference issue on struct io_cq

Damien Le Moal <dlemoal@xxxxxxxxxx> · Wed, 17 May 2023 18:20:19 +0900

On 5/17/23 17:58, Yu Kuai wrote:
> Hi,
> 
> 在 2023/05/17 16:44, Pradeep P V K 写道:
>> There is a potential race between ioc_clear_fn() and
>> exit_io_context() as shown below, due to which below
>> crash is observed. It can also result into use-after-free
>> issue.
>>
>> context#1:                           context#2:
>> ioc_release_fn()                     do_exit();
>> ->spin_lock(&ioc->lock);             ->exit_io_context();
>> ->ioc_destroy_icq(icq);              ->ioc_exit_icqs();
>>   ->list_del_init(&icq->q_node);       ->spin_lock_irq(&ioc->lock);
>>   ->call_rcu(&icq->__rcu_head,
>>       icq_free_icq_rcu);
>> ->spin_unlock(&ioc->lock);
>>                                        ->ioc_exit_icq(); gets the same icq
> I don't understand how is this possible, the list is protected by
> 'ioc->lock', both hlist_del_init and hlist_for_each_entry are called
> inside the lock.

Given that ioc_destroy_icq() calls ioc_exit_icq(), ioc_exit_icqs() should ignore
all icqs that have been destroyed already, otherwise, ioc_exit_icq() gets called
twice for the same icq. The missing rcu lock in ioc_exit_icqs() already was in
itself a bug, and the missing flag check is another.

> 
> Thanks,
> Kuai
>> 				       ->bfq_exit_icq();
>>                                    This results into below crash as bic
>> 				  is NULL as it is derived from icq.
>> 				  There is a chance that icq could be
>> 				  free'd as well.
>>
>> [33.245722][ T8666] Unable to handle kernel NULL pointer dereference
>> at virtual address 0000000000000018.
>> ...
>> Call trace:
>> [33.325782][ T8666]  bfq_exit_icq+0x28/0xa8
>> [33.325785][ T8666]  exit_io_context+0xcc/0x100
>> [33.325786][ T8666]  do_exit+0x764/0xa58
>> [33.325791][ T8666]  do_group_exit+0x0/0xa0
>> [33.325793][ T8666]  invoke_syscall+0x48/0x114
>> [33.325802][ T8666]  el0_svc_common+0xcc/0x118
>> [33.325805][ T8666]  do_el0_svc+0x34/0xd0
>> [33.325807][ T8666]  el0_svc+0x38/0xd0
>> [33.325812][ T8666]  el0t_64_sync_handler+0x8c/0xfc
>> [33.325813][ T8666]  el0t_64_sync+0x1a0/0x1a4
>>
>> Fix this by checking with ICQ_DESTROYED flags in ioc_exit_icqs().
>> Also, ensure ioc_exit_icq() is accessing icq within rcu_read_lock/unlock
>> so that icq doesn't get free'd up while it is still using it.
>>
>> Signed-off-by: Pradeep P V K <quic_pragalla@xxxxxxxxxxx>

Pradeep, this needs a Fixes tag and cc-stable I think.

>> ---
>>   block/blk-ioc.c | 8 ++++++--
>>   1 file changed, 6 insertions(+), 2 deletions(-)
>>
>> diff --git a/block/blk-ioc.c b/block/blk-ioc.c
>> index 63fc02042408..1aa34fd46ac8 100644
>> --- a/block/blk-ioc.c
>> +++ b/block/blk-ioc.c
>> @@ -60,10 +60,14 @@ static void ioc_exit_icqs(struct io_context *ioc)
>>   {
>>   	struct io_cq *icq;
>>   
>> +	rcu_read_lock();
>>   	spin_lock_irq(&ioc->lock);
>> -	hlist_for_each_entry(icq, &ioc->icq_list, ioc_node)
>> -		ioc_exit_icq(icq);
>> +	hlist_for_each_entry(icq, &ioc->icq_list, ioc_node) {
>> +		if (!(icq->flags & ICQ_DESTROYED))
>> +			ioc_exit_icq(icq);
>> +	}
>>   	spin_unlock_irq(&ioc->lock);
>> +	rcu_read_unlock();
>>   }
>>   
>>   /*
>>
> 

-- 
Damien Le Moal
Western Digital Research