Re: [PATCH] null_blk: allow teardown on request timeout

Ming Lei <ming.lei@xxxxxxxxxx> · Mon, 17 Oct 2022 18:16:40 +0800

On Mon, Oct 17, 2022 at 10:04:26AM +0000, Chaitanya Kulkarni wrote:
> On 10/17/22 02:50, Ming Lei wrote:
> > On Mon, Oct 17, 2022 at 09:30:47AM +0000, Chaitanya Kulkarni wrote:
> >>
> >>>> +	/*
> >>>> +	 * Unblock any pending dispatch I/Os before we destroy the device.
> >>>> +	 * From null_destroy_dev()->del_gendisk() will set GD_DEAD flag
> >>>> +	 * causing any new I/O from __bio_queue_enter() to fail with -ENODEV.
> >>>> +	 */
> >>>> +	blk_mq_unquiesce_queue(nullb->q);
> >>>> +
> >>>> +	null_destroy_dev(nullb);
> >>>
> >>> destroying device is never good cleanup for handling timeout/abort, and it
> >>> should have been the last straw any time.
> >>>
> >>
> >> That is exactly why I've added the rq_abort_limit, so until the limit
> >> is not reached null_abort_work() will not get scheduled and device is
> >> not destroyed.
> > 
> > I meant destroying device should only be done iff the normal abort handler
> > can't recover the device, however, your patch simply destroys device
> > without running any abort handling.
> > 
> 
> I did not understand your comment, can you please elaborate on exactly
> where and which abort handlers needs to be called in this patch before
> null_destroy_nullb() ?

In case of request timeout, there may be something wrong which needs
to be recovered.

> 
> the objective of this patch it to simulate the teardown scenario
> from timeout handler so it can get tested on regular basis with
> null_blk ...

Why does teardown scenario have to be triggered for timeout? That
looks you think teardown & destroying device for timeout is one normal
and common way, but I think it is not, the device shouldn't be removed
if it still can work. I have got such kind of complaints of disk
disappeared just by request timeout, such as, nvme-pci.

thanks,
Ming