Re: [PATCH 2/6] libmultipath: directio: don't reset ct->running after io_cancel()

Martin Wilck <mwilck@xxxxxxxx> · Thu, 26 Oct 2023 10:31:28 +0200

On Wed, 2023-10-25 at 19:14 -0400, Benjamin Marzinski wrote:
> On Tue, Oct 24, 2023 at 06:49:33PM +0200, mwilck@xxxxxxxx wrote:
> > 
> > --- a/libmultipath/checkers/directio.c
> > +++ b/libmultipath/checkers/directio.c
> > @@ -216,7 +216,6 @@ out:
> >  void libcheck_free (struct checker * c)
> >  {
> >  	struct directio_context * ct = (struct directio_context
> > *)c->context;
> > -	struct io_event event;
> >  	long flags;
> >  
> >  	if (!ct)
> > @@ -232,9 +231,7 @@ void libcheck_free (struct checker * c)
> >  		}
> >  	}
> >  
> > -	if (ct->running &&
> > -	    (ct->req->state != PATH_PENDING ||
> > -	     io_cancel(ct->aio_grp->ioctx, &ct->req->io, &event)
> > == 0))
> 
> Does io_cancel() do nothing? 

No it doesn't, as Hannes already said. Currently, at least. 
See https://elixir.bootlin.com/linux/latest/source/fs/aio.c:
ki_cancel() is only set for aio_poll() and some USB gadgets. For
regular block devices, io_cancel() is equivalent to 
"return -EINPROGRESS;".

> If does make the IO likely to end sooner,
> even if it always returns failure, then we should probably still run
> it
> when we have an outstanding request that we are going to orphan.
> We
> should just move the io_cancel() to the else block, where we add the
> request to the orphans list.

But I suppose it would be possible to hook ki_cancel() into the SCSI
layer, translating it into a (series of) command aborts. Perhaps
someone will implement that in the future; it seem that just nobody has
bothered yet.

So yes, I will re-add io_cancel() in the else clause in
libcheck_free(). 
It won't do harm, and we'll be prepared for kernel 8.0 ;-)

If we want more predictable timeouts for directio and the io_err_stat
code _today_, we would need to move away from aio and use SG_IO
instead. That would allow us to set SCSI timeouts explicitly, and
(mostly) avoid retries on the SCSI mid layer.

Regards
Martin