Re: [PATCH v2 1/2] block: prevent freeing a zone write plug too early

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2024/04/24 1:36, Jens Axboe wrote:
> On 4/23/24 9:16 AM, Damien Le Moal wrote:
>> On 2024/04/24 0:21, Jens Axboe wrote:
>>> On 4/20/24 1:58 AM, Damien Le Moal wrote:
>>>> The submission of plugged BIOs is done using a work struct executing the
>>>> function blk_zone_wplug_bio_work(). This function gets and submits a
>>>> plugged zone write BIO and is guaranteed to operate on a valid zone
>>>> write plug (with a reference count higher than 0) on entry as plugged
>>>> BIOs hold a reference on their zone write plugs. However, once a BIO is
>>>> submitted with submit_bio_noacct_nocheck(), the BIO may complete before
>>>> blk_zone_wplug_bio_work(), with the BIO completion trigering a release
>>>> and freeing of the zone write plug if the BIO is the last write to a
>>>> zone (making the zone FULL). This potentially can result in the zone
>>>> write plug being freed while the work is still active.
>>>>
>>>> Avoid this by calling flush_work() from disk_free_zone_wplug_rcu().
>>>>
>>>> Fixes: dd291d77cc90 ("block: Introduce zone write plugging")
>>>> Signed-off-by: Damien Le Moal <dlemoal@xxxxxxxxxx>
>>>> ---
>>>>  block/blk-zoned.c | 2 ++
>>>>  1 file changed, 2 insertions(+)
>>>>
>>>> diff --git a/block/blk-zoned.c b/block/blk-zoned.c
>>>> index 3befebe6b319..685f0b9159fd 100644
>>>> --- a/block/blk-zoned.c
>>>> +++ b/block/blk-zoned.c
>>>> @@ -526,6 +526,8 @@ static void disk_free_zone_wplug_rcu(struct rcu_head *rcu_head)
>>>>  	struct blk_zone_wplug *zwplug =
>>>>  		container_of(rcu_head, struct blk_zone_wplug, rcu_head);
>>>>  
>>>> +	flush_work(&zwplug->bio_work);
>>>> +
>>>>  	mempool_free(zwplug, zwplug->disk->zone_wplugs_pool);
>>>>  }
>>>
>>> This is totally backwards. First of all, if you actually had work that
>>> needed flushing at this point, the kernel would bomb spectacularly.
>>> Secondly, what's the point of using RCU to protect this, if you're now
>>> needing to flush work from the RCU callback? That's a clear sign that
>>> something is very wrong here with your references / RCU usage.. The work
>>> item should hold a reference to it, trying to paper around it like this
>>> is not going to work at all.
>>>
>>> Why is the work item racing with RCU freeing?!
>>
>> The work item is a field of the zone write plug. Zone write plugs have
>> references to them as long as BIOs are in flight and and the zone is
>> not full. The zone write plug freeing through rcu is triggered by the
>> last write to a zone that makes the zone full. But the completion of
>> this last write BIO may happen right after the work issued the BIO
>> with submit_bio_noacct_nocheck() and before blk_zone_wplug_bio_work()
>> returns, while the work item is still active.
>>
>> The actual freeing of the plug happens only after the rcu grace
>> period, and I was not entirely sure if this is enough to guarantee
>> that the work thread is finished. But checking how the workqueue code
>> processes the work item by calling the work function
>> (blk_zone_wplug_bio_work() in this case), there is no issue because
>> the work item (struct work_struct) is not touched once the work
>> function is called. So there are no issues/races with freeing the zone
>> write plug. I was overthinking this. My bad. We can drop this patch.
>> Apologies for the noise.
> 
> I took a closer look at the zone write plug reference handling, and it
> still doesn't look very good. Why are some just atomic_dec and there's
> just one spot that does dec_and_test? This again looks like janky
> referencing, to be honest.
> 
> The relationship seems like it should be pretty clear. Any bio inflight
> against this zone plug should have a reference to it, AND the owner
> should have a reference to it, otherwise any bio completion (which can
> happen at ANY time) could free it. Any dropping of the ref should use a
> helper that does atomic_dec_and_test(), eg what disk_put_zone_wplug()
> does.
> 
> There should be no doubt about the above at all. If the plug has been
> added to a workqueue, it should be quite obvious that of course it has a
> reference to it already, outside of the bio's that are in it.
> 
> I'd strongly encourage getting this sorted out before the merge window,
> I'm not at all convinced it's correct as-is. It's certainly not
> obviously correct, which it should be. The RCU rules are pretty simple
> if the the references are done in the kernel idiomatic way, but they are
> not.

OK. I am traveling this week so I will not be able to send a well-tested cleanup
patch but I will do so first thing next week.

-- 
Damien Le Moal
Western Digital Research





[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux