Re: [PATCH 2/3] scsi: make sure that request queue queiesce and unquiesce balanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 11/2/21 8:36 AM, Jens Axboe wrote:
> On 11/2/21 8:33 AM, James Bottomley wrote:
>> On Tue, 2021-11-02 at 06:59 -0600, Jens Axboe wrote:
>>> On 11/1/21 7:43 PM, James Bottomley wrote:
>>>> On Thu, 2021-10-21 at 22:59 +0800, Ming Lei wrote:
>>>>> For fixing queue quiesce race between driver and block
>>>>> layer(elevator switch, update nr_requests, ...), we need to
>>>>> support concurrent quiesce and unquiesce, which requires the two
>>>>> call balanced.
>>>>>
>>>>> It isn't easy to audit that in all scsi drivers, especially the
>>>>> two may be called from different contexts, so do it in scsi core
>>>>> with one per-device bit flag & global spinlock, basically zero
>>>>> cost since request queue quiesce is seldom triggered.
>>>>>
>>>>> Reported-by: Yi Zhang <yi.zhang@xxxxxxxxxx>
>>>>> Fixes: e70feb8b3e68 ("blk-mq: support concurrent queue
>>>>> quiesce/unquiesce")
>>>>> Signed-off-by: Ming Lei <ming.lei@xxxxxxxxxx>
>>>>> ---
>>>>>  drivers/scsi/scsi_lib.c    | 45 ++++++++++++++++++++++++++++++
>>>>> ----
>>>>> ----
>>>>>  include/scsi/scsi_device.h |  1 +
>>>>>  2 files changed, 37 insertions(+), 9 deletions(-)
>>>>>
>>>>> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
>>>>> index 51fcd46be265..414f4daf8005 100644
>>>>> --- a/drivers/scsi/scsi_lib.c
>>>>> +++ b/drivers/scsi/scsi_lib.c
>>>>> @@ -2638,6 +2638,40 @@ static int
>>>>> __scsi_internal_device_block_nowait(struct scsi_device *sdev)
>>>>>  	return 0;
>>>>>  }
>>>>>  
>>>>> +static DEFINE_SPINLOCK(sdev_queue_stop_lock);
>>>>> +
>>>>> +void scsi_start_queue(struct scsi_device *sdev)
>>>>> +{
>>>>> +	bool need_start;
>>>>> +	unsigned long flags;
>>>>> +
>>>>> +	spin_lock_irqsave(&sdev_queue_stop_lock, flags);
>>>>> +	need_start = sdev->queue_stopped;
>>>>> +	sdev->queue_stopped = 0;
>>>>> +	spin_unlock_irqrestore(&sdev_queue_stop_lock, flags);
>>>>> +
>>>>> +	if (need_start)
>>>>> +		blk_mq_unquiesce_queue(sdev->request_queue);
>>>>
>>>> Well, this is a classic atomic pattern:
>>>>
>>>> if (cmpxchg(&sdev->queue_stopped, 1, 0))
>>>> 	blk_mq_unquiesce_queue(sdev->request_queue);
>>>>
>>>> The reason to do it with atomics rather than spinlocks is
>>>>
>>>>    1. no need to disable interrupts: atomics are locked
>>>>    2. faster because a spinlock takes an exclusive line every time
>>>> but the
>>>>       read to check the value can be in shared mode in cmpxchg
>>>>    3. it's just shorter and better code.
>>>>
>>>> The only minor downside is queue_stopped now needs to be a u32.
>>>
>>> Are you fine with the change as-is, or do you want it redone? I
>>> can drop the SCSI parts and just queue up the dm fix. Personally
>>> I think it'd be better to get it fixed upfront.
>>
>> Well, given the path isn't hot, I don't really care.  However, what I
>> don't want is to have to continually bat back patches from the make
>> work code churners trying to update this code for being the wrong
>> pattern.  I think at the very least it needs a comment saying why we
>> chose a suboptimal pattern to try to forestall this.
> 
> Right, with a comment it's probably better. And as you said, since it's
> not a hot path, don't think we'd be revisiting it anyway.
> 
> I'll amend the patch with a comment.

I started adding the comment and took another look at this, and that
made me change my mind. We really should make this a cmpxcgh, it's not
even using a device lock here.

I've dropped the two SCSI patches for now, Ming can you resend? If James
agrees, I really think queue_stopped should just have the type changed
and the patch redone with that using cmpxcgh().

-- 
Jens Axboe

--
dm-devel mailing list
dm-devel@xxxxxxxxxx
https://listman.redhat.com/mailman/listinfo/dm-devel




[Index of Archives]     [DM Crypt]     [Fedora Desktop]     [ATA RAID]     [Fedora Marketing]     [Fedora Packaging]     [Fedora SELinux]     [Yosemite Discussion]     [KDE Users]     [Fedora Docs]

  Powered by Linux