On 08/17/16 22:55, Sreekanth Reddy wrote: > Observing softlockups while running heavy IOs on 8 SSD drives > connected behind our LSI SAS 3004 HBA. Hello Sreekanth, This means that more than 23s was spent before the scheduler was invoked, probably due to a loop. Can you give the attached (untested) patch a try to see whether it is the loop in __blk_mq_run_hw_queue()? Thanks, Bart.
From 4da94f2ec37ee5d1b4a5f1ce2886bdafd5cd394c Mon Sep 17 00:00:00 2001 From: Bart Van Assche <bart.vanassche@xxxxxxxxxxx> Date: Thu, 18 Aug 2016 07:51:49 -0700 Subject: [PATCH] block: Measure __blk_mq_run_hw_queue() execution time Note: the "max_elapsed" variable can be modified by multiple threads concurrently. --- block/blk-mq.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/block/blk-mq.c b/block/blk-mq.c index e931a0e..6d0961c 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -792,6 +792,9 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx) LIST_HEAD(driver_list); struct list_head *dptr; int queued; + static long max_elapsed = -1; + unsigned long start = jiffies; + long elapsed; WARN_ON(!cpumask_test_cpu(raw_smp_processor_id(), hctx->cpumask)); @@ -889,6 +892,13 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx) **/ blk_mq_run_hw_queue(hctx, true); } + + elapsed = jiffies - start; + if (elapsed > max_elapsed) { + max_elapsed = elapsed; + pr_info("%s() finished after %d ms\n", __func__, + jiffies_to_msecs(elapsed)); + } } /* -- 2.9.2