On 22/04/2020 19:59, Kashyap Desai wrote:
Hi Kashyap,
So I tested this on hisi_sas with x12 SAS SSDs, and performance with "mq-
deadline" is comparable with "none" @ ~ 2M IOPs. But after a while
performance drops alot, to maybe 700K IOPS. Do you have a similar
experience?
I am using mq-deadline only for HDD. I have not tried on SSD since it is not
useful scheduler for SSDs.
I ask as I only have SAS SSDs to test.
I noticed that when I used mq-deadline, performance drop starts if I have
more number of drives.
I am running <fio> script which has 64 Drives, 64 thread and all treads are
bound to local numa node which has 36 logical cores.
I noticed that lock contention is in " dd_dispatch_request". I am not sure
why there is a no penalty of same lock in nr_hw_queue = 1 mode.
So this could be just pre-existing issue of exposing multiple queues for
SCSI HBAs combined with mq-deadline iosched. I mean, that's really the
only significant change in this series, apart from the shared sbitmap,
and, at this point, I don't think that is the issue.
static struct request *dd_dispatch_request(struct blk_mq_hw_ctx *hctx)
{
struct deadline_data *dd = hctx->queue->elevator->elevator_data;
struct request *rq;
spin_lock(&dd->lock);
So if multiple hctx's are accessing this lock, then much contention
possible.
rq = __dd_dispatch_request(dd);
spin_unlock(&dd->lock);
return rq;
}
Here is perf report -
- 1.04% 0.99% kworker/18:1H+k [kernel.vmlinux] [k]
native_queued_spin_lock_slowpath
0.99% ret_from_fork
- kthread
- worker_thread
- 0.98% process_one_work
- 0.98% __blk_mq_run_hw_queue
- blk_mq_sched_dispatch_requests
- 0.98% blk_mq_do_dispatch_sched
- 0.97% dd_dispatch_request
+ 0.97% queued_spin_lock_slowpath
+ 1.04% 0.00% kworker/18:1H+k [kernel.vmlinux] [k]
queued_spin_lock_slowpath
+ 1.03% 0.95% kworker/19:1H-k [kernel.vmlinux] [k]
native_queued_spin_lock_slowpath
+ 1.03% 0.00% kworker/19:1H-k [kernel.vmlinux] [k]
queued_spin_lock_slowpath
+ 1.02% 0.97% kworker/20:1H+k [kernel.vmlinux] [k]
native_queued_spin_lock_slowpath
+ 1.02% 0.00% kworker/20:1H+k [kernel.vmlinux] [k]
queued_spin_lock_slowpath
+ 1.01% 0.96% kworker/21:1H+k [kernel.vmlinux] [k]
native_queued_spin_lock_slowpath
I'll try to capture a perf report and compare to mine.
Thanks very much,
john