Re: [PATCH 4/4] nbd: fix zero cmd timeout handling

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 08/13/2019 08:13 AM, Josef Bacik wrote:
> On Fri, Aug 09, 2019 at 04:26:10PM -0500, Mike Christie wrote:
>> This fixes a regression added in 4.9 with commit:
>>
>> commit 0eadf37afc2500e1162c9040ec26a705b9af8d47
>> Author: Josef Bacik <jbacik@xxxxxx>
>> Date:   Thu Sep 8 12:33:40 2016 -0700
>>
>>     nbd: allow block mq to deal with timeouts
>>
>> where before the patch userspace would set the timeout to 0 to disable
>> it. With the above patch, a zero timeout tells the block layer to use
>> the default value of 30 seconds. For setups where commands can take a
>> long time or experience transient issues like network disruptions this
>> then results in IO errors being sent to the application.
>>
>> To fix this, the patch still uses the common block layer timeout
>> framework, but if zero is set, nbd just logs a message and then resets
>> the timer when it expires.
>>
>> Josef,
>>
>> I did not cc stable, but I think we want to port the patches to some
>> releases. We originally hit this with users using the longterm kernels
>> with ceph. The patch does not apply anywhere cleanly with older ones
>> like 4.9, so I was not sure how we wanted to handle it.
>>
> 
> I assume you tested this?  IIRC there was a problem where 0 really meant 0 and

Yes.

> commands would insta-timeout.  But my memory is foggy here, so I'm not sure if
> it was setting the tag_set timeout to 0 that made things go wrong, or what.  Or
> I could be making it all up, who knows.

Yes, if you call blk_queue_rq_timeout with 0, then the command will
timeout almost immediately. I added a check for this in the first patch.

If blk_mq_tag_set.timeout is 0, blk_mq_init_allocated_queue uses the
default 30 second value.

So with the patch if the user sets the timeout to 0, then we will just
log a message every 30 seconds that the command is stuck.

> 
> There's a blktest that just runs fio on a normal device with no timeouts or
> anything, that's where I would see the problem since it was a little racy.
> Basically have the timeout set to 0 and put load on the disk and eventually
> you'd start seeing timeouts.  If that all goes fine then you can add
> 
> Reviewed-by: Josef Bacik <josef@xxxxxxxxxxxxxx>
> 

Ok.




[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux