Re: scsi LLD implementation question

James Bottomley <James.Bottomley@xxxxxxx> · Fri, 25 Feb 2011 12:20:06 -0500

On Fri, 2011-02-25 at 18:12 +0100, Bart Van Assche wrote:
> On Fri, Feb 25, 2011 at 2:37 PM, James Bottomley
> <James.Bottomley@xxxxxxx> wrote:
> > Having a kthread process responses is generally not a good idea because
> > completions will come in at interrupt level ... you need a context
> > switch to get to a thread and this costs latency.  The idea of done
> > processing in SCSI is to identify the scsi_cmnd as quickly as possible
> > and post it.  All back end SCSI processing is done in the block softirq
> > (a level between hard interrupt and user context), again to keep latency
> > low.  That also means that the kthread architecture is wrong because
> > it's difficult for the kernel to go hardirq->user->softirq without
> > adding an extra interrupt latency (usually a clock tick).
> >
> > If you want a "threaded" response in a multiqueue card using MSIs, then
> > you bind the MSIs to CPU groups and use the hardware interrupt context
> > as the threading (I think drivers like lpfc already do this).  The best
> > performance is actually observed when the MSI comes back in on the same
> > CPU that issued the I/O because the cache is still hot.  The block keeps
> > an rq->cpu to tag this which internal HBA setup can use for programming
> > MSI completions.
> 
> The above sounds like great advice if the processing time is
> reasonably short. But what if the processing time can be anything
> between e.g. a microsecond and twenty minutes ?

Well, what processing?  SCSI LLDs are data shifting engines; there's not
a lot of extra stuff to do.  If you mean things like integrity
verification, they tend to be done inline adding directly to latency as
a cost of turning on integrity.  If you mean something like
excrutiatingly slow PIO just to capture the data, then that's up to the
LLD ... but most do it in-line (bogging down the whole system) primarily
because timing tends to be critical to avoid FIFO overruns (the lesson
being to avoid those cards).

Can you give an example?  I can't really think of any processing that's
so huge it would require threaded offloading.  The main point I was
making is that offloading to a thread between HW irq and SCSI done adds
enormously to latency because of the way done completions are processed
in softirq context.  If that latency is just a drop in the ocean
compared to the processing, then sure, offload it.

James

James

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html