On Tue, Aug 13, 2019 at 10:45:55AM -0500, Mike Christie wrote: > On 08/13/2019 08:13 AM, Josef Bacik wrote: > > On Fri, Aug 09, 2019 at 04:26:10PM -0500, Mike Christie wrote: > >> This fixes a regression added in 4.9 with commit: > >> > >> commit 0eadf37afc2500e1162c9040ec26a705b9af8d47 > >> Author: Josef Bacik <jbacik@xxxxxx> > >> Date: Thu Sep 8 12:33:40 2016 -0700 > >> > >> nbd: allow block mq to deal with timeouts > >> > >> where before the patch userspace would set the timeout to 0 to disable > >> it. With the above patch, a zero timeout tells the block layer to use > >> the default value of 30 seconds. For setups where commands can take a > >> long time or experience transient issues like network disruptions this > >> then results in IO errors being sent to the application. > >> > >> To fix this, the patch still uses the common block layer timeout > >> framework, but if zero is set, nbd just logs a message and then resets > >> the timer when it expires. > >> > >> Josef, > >> > >> I did not cc stable, but I think we want to port the patches to some > >> releases. We originally hit this with users using the longterm kernels > >> with ceph. The patch does not apply anywhere cleanly with older ones > >> like 4.9, so I was not sure how we wanted to handle it. > >> > > > > I assume you tested this? IIRC there was a problem where 0 really meant 0 and > > Yes. > > > commands would insta-timeout. But my memory is foggy here, so I'm not sure if > > it was setting the tag_set timeout to 0 that made things go wrong, or what. Or > > I could be making it all up, who knows. > > Yes, if you call blk_queue_rq_timeout with 0, then the command will > timeout almost immediately. I added a check for this in the first patch. > Ahhh that's what it was, thank you. I'm cool then, you can add Reviewed-by: Josef Bacik <josef@xxxxxxxxxxxxxx> Thanks, Josef