Re: [PATCH] libata error handling fixes (ATAPI)

Jens Axboe <axboe@xxxxxxx> · Wed, 16 Nov 2005 13:40:36 +0100

On Tue, Nov 15 2005, Jens Axboe wrote:
> On Tue, Nov 15 2005, Mike Christie wrote:
> > Jens Axboe wrote:
> > >On Tue, Nov 15 2005, Jeff Garzik wrote:
> > >
> > >>>For departure of libata from SCSI, I was thinking more of another more 
> > >>>generic block device framework in which libata can live in.  And I 
> > >>>thought that it was reasonable to assume that the framework would supply 
> > >>>a EH mechanism which supports queue stalling/draining and separate 
> > >>>thread.  So, my EH patches tried to make the same environment for libata 
> > >>
> > >>A big reason why libata uses the SCSI layer is infrastructure like this. 
> > >>It would certainly be nice to see timeouts and EH at the block layer. 
> > >>The block layer itself already supports queue stalling/draining.
> > >
> > >
> > >I have a pretty simple plan for this:
> > >
> > >- Add a timer to struct request. It already has a timeout field for
> > >  SG_IO originated requests, we could easily utilize this in general.
> > >  I'm not sure how the querying of timeout would happen so far, it would
> > >  probably require a q->set_rq_timeout() hook to ask the low level
> > >  driver to set/return rq->timeout for a given request.
> > >
> > >- Add a timeout hook to struct request_queue that would get invoked from
> > >  the timeout handler. Something along the lines of:
> > >
> > >        - Timeout on a request happens. Freeze the queue and use
> > >          kblockd to take the actual timeout into process context, where
> > >          we call the queue ->rq_timeout() hook. Unfreeze/reschedule
> > >          queue operations based on what the ->rq_timeout() hook tells
> > >          us.
> > >
> > >That is generic enough to be able to arm the timeout automatically from
> > >->elevator_activate_req_fn() and dearm it when it completes or gets
> > >deactivated. It should also be possible to implement the SCSI error
> > >handling on top of that.
> > >
> > 
> > To disable the timeout would you then have scsi_done call a block layer 
> > function to disarm it then follow the current flow where or do you think 
> > it would be nice to move the scsi softirq code up to block layer. So 
> > scsi_done would call a block layer function that would disarm the timer, 
> > add the request to a block layer softirq list (a list like scsi-ml's 
> > scsi_done_q), and then in the block layer softirq function it could call 
> > a request_queue callout which for scsi-ml's device queue would call 
> > scsi_decide_disposition and return if it wanted the request requeued or 
> > how many sectors completed or to kick off the eh. I had stated on this 
> > for my block layer multipath driver, but can seperate the patches if 
> > this would be useful.
> 
> Yeah, that was part of my plan as well. I did post such a patch a year
> or so ago, in a thread about decreasing ide completion latencies.
> 
> > Would ide benefit from running from a softirq and would it be able to 
> > use such a thing?
> 
> It's generally useful as it allows lock free completion from the irq
> path, so that's goodness.

I updated that patch, and converted IDE and SCSI to use it. See the
results here:

http://brick.kernel.dk/git/?p=linux-2.6-block.git;a=shortlog;h=blk-softirq

The main change from the version posted last october is killing the
'slightly' overdesigned completion queue hashing.

-- 
Jens Axboe

-
: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html