Hello, when debugging one bug, I've noticed one strangeness in scsi_request_fn(). We enter it with interrupts disabled and queue_lock held. In the function we do stuff like: spin_unlock_irq(shost->host_lock); /* * Finally, initialize any error handling parameters, and * set up * the timers for timeouts. */ scsi_init_cmd_errh(cmd); /* * Dispatch the command to the low-level driver. */ rtn = scsi_dispatch_cmd(cmd); spin_lock_irq(q->queue_lock); Why do we enable interrupts there? The thing is that scsi_request_fn() can be called from IO completion (thus softirq context) and if we enable interrupts there, another HW interrupt can interrupt softirq processing which seems wrong to me. Now to admit my motivation I have reports from one customer (SLES kernel so too old to be really interesting upstream but still of some value I believe) where he's seeing softlockup reports when fully loading his SAN and from the crash dumps it seems the machine simply spends too long in IO completion because HW interrupts keep interrupting it plus there's contention on shost->host_lock. If I change scsi_request_fn() to never enable interrupts problems go away. So what would people say to something like the attached patch? Honza -- Jan Kara <jack@xxxxxxx> SUSE Labs, CR
>From a7c74ea9375cf0b04a11ac3f00bcd835a8499422 Mon Sep 17 00:00:00 2001 From: Jan Kara <jack@xxxxxxx> Date: Fri, 1 Aug 2014 22:45:32 +0200 Subject: [PATCH] scsi: Keep interrupts disabled while submitting requests scsi_request_fn() can be called from softirq context during IO completion. If it enables interrupts there, HW interrupts can interrupt softirq processing and queue more IO completion work which can eventually lead to softlockup reports because IO completion softirq runs for too long. Keep interrupts disabled in scsi_request_fn(). Signed-off-by: Jan Kara <jack@xxxxxxx> --- drivers/scsi/scsi_lib.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index f7e316368c99..44b867e9adc9 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -1481,7 +1481,8 @@ static void scsi_softirq_done(struct request *rq) * * Returns: Nothing * - * Lock status: IO request lock assumed to be held when called. + * Lock status: IO request lock assumed to be held when called, interrupts + * must be disabled. */ static void scsi_request_fn(struct request_queue *q) __releases(q->queue_lock) @@ -1563,7 +1564,7 @@ static void scsi_request_fn(struct request_queue *q) * XXX(hch): This is rather suboptimal, scsi_dispatch_cmd will * take the lock again. */ - spin_unlock_irq(shost->host_lock); + spin_unlock(shost->host_lock); /* * Finally, initialize any error handling parameters, and set up @@ -1575,7 +1576,7 @@ static void scsi_request_fn(struct request_queue *q) * Dispatch the command to the low-level driver. */ rtn = scsi_dispatch_cmd(cmd); - spin_lock_irq(q->queue_lock); + spin_lock(q->queue_lock); if (rtn) goto out_delay; } @@ -1583,7 +1584,7 @@ static void scsi_request_fn(struct request_queue *q) return; not_ready: - spin_unlock_irq(shost->host_lock); + spin_unlock(shost->host_lock); /* * lock q, handle tag, requeue req, and decrement device_busy. We @@ -1593,7 +1594,7 @@ static void scsi_request_fn(struct request_queue *q) * cases (host limits or settings) should run the queue at some * later time. */ - spin_lock_irq(q->queue_lock); + spin_lock(q->queue_lock); blk_requeue_request(q, req); sdev->device_busy--; out_delay: -- 1.8.1.4