Re: Interrupt strangeness in scsi_request_fn()

James Bottomley <James.Bottomley@xxxxxxxxxxxxxxxxxxxxx> · Fri, 01 Aug 2014 18:00:03 -0400

On Fri, 2014-08-01 at 22:48 +0200, Jan Kara wrote:
>   Hello,
> 
>   when debugging one bug, I've noticed one strangeness in
> scsi_request_fn(). We enter it with interrupts disabled and queue_lock
> held. In the function we do stuff like:
>                 spin_unlock_irq(shost->host_lock);
>                 /*
>                  * Finally, initialize any error handling parameters, and
>                  * set up
>                  * the timers for timeouts.
>                  */
>                 scsi_init_cmd_errh(cmd);
> 
>                 /*
>                  * Dispatch the command to the low-level driver.
>                  */
>                 rtn = scsi_dispatch_cmd(cmd);
>                 spin_lock_irq(q->queue_lock);
> 
> Why do we enable interrupts there? The thing is that scsi_request_fn() can
> be called from IO completion (thus softirq context) and if we enable
> interrupts there, another HW interrupt can interrupt softirq processing
> which seems wrong to me.

According to my reading of kernel/softirq.c, it looks like interrupts
*are* already enabled when softirq functions are called by design.  That
doesn't mean your patch is necessarily wrong, it just means the premise
above isn't quite right.

> Now to admit my motivation I have reports from one customer (SLES kernel so
> too old to be really interesting upstream but still of some value I
> believe) where he's seeing softlockup reports when fully loading his SAN
> and from the crash dumps it seems the machine simply spends too long in IO
> completion because HW interrupts keep interrupting it plus there's
> contention on shost->host_lock. If I change scsi_request_fn() to never
> enable interrupts problems go away.
> 
> So what would people say to something like the attached patch?

I think the theory for enabling interrupts here is that the path is
rather long ... it goes through the driver to the hardware and that the
problem you describe above *should* be self correcting.  If we keep
interrupting dispatch, then we're throttling the outbound I/O which
should reduce the returning interrupts.

I'm not immovably opposed, I'd just like to see better justification.

James

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html