For some storage configurations the coarse grained cpu grouping (socket) does not supply enough cpu to keep up with the demands of high iops. Bypass the grouping and complete on the direct requester cpu when the local cpu is under softirq pressure (as measured by ksoftirqd being in the running state). Cc: Matthew Wilcox <matthew@xxxxxx> Cc: Christoph Hellwig <hch@xxxxxxxxxxxxx> Cc: Roland Dreier <roland@xxxxxxxxxxxxxxx> Tested-by: Dave Jiang <dave.jiang@xxxxxxxxx> Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx> --- block/blk-softirq.c | 12 +++++++++++- 1 files changed, 11 insertions(+), 1 deletions(-) diff --git a/block/blk-softirq.c b/block/blk-softirq.c index 475fab8..f0cda19 100644 --- a/block/blk-softirq.c +++ b/block/blk-softirq.c @@ -101,16 +101,20 @@ static struct notifier_block __cpuinitdata blk_cpu_notifier = { .notifier_call = blk_cpu_notify, }; +DECLARE_PER_CPU(struct task_struct *, ksoftirqd); + void __blk_complete_request(struct request *req) { int ccpu, cpu, group_cpu = NR_CPUS; struct request_queue *q = req->q; + struct task_struct *tsk; unsigned long flags; BUG_ON(!q->softirq_done_fn); local_irq_save(flags); cpu = smp_processor_id(); + tsk = per_cpu(ksoftirqd, cpu); /* * Select completion CPU @@ -124,7 +128,13 @@ void __blk_complete_request(struct request *req) } else ccpu = cpu; - if (ccpu == cpu || ccpu == group_cpu) { + /* + * try to skip a remote softirq-trigger if the completion is + * within the same group, but not if local softirqs have already + * spilled to ksoftirqd + */ + if (ccpu == cpu || + (ccpu == group_cpu && tsk->state != TASK_RUNNING)) { struct list_head *list; do_local: list = &__get_cpu_var(blk_cpu_done); -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html