Hi Ming, I was running some performance test on latest 4.17-rc and figure out performance drop (approximate 15% drop) due to below patch set. https://marc.info/?l=linux-block&m=150802309522847&w=2 I observed drop on latest 4.16.6 stable and 4.17-rc kernel as well. Taking bisect approach, figure out that Issue is not observed using last stable kernel 4.14.38. I pick 4.14.38 stable kernel as base line and applied above patch to confirm the behavior. lscpu output - Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 72 On-line CPU(s) list: 0-71 Thread(s) per core: 2 Core(s) per socket: 18 Socket(s): 2 NUMA node(s): 2 Vendor ID: GenuineIntel CPU family: 6 Model: 85 Model name: Intel(R) Xeon(R) Gold 6150 CPU @ 2.70GHz Stepping: 4 CPU MHz: 1457.182 CPU max MHz: 2701.0000 CPU min MHz: 1200.0000 BogoMIPS: 5400.00 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 1024K L3 cache: 25344K NUMA node0 CPU(s): 0-17,36-53 NUMA node1 CPU(s): 18-35,54-71 I am having 16 SSDs - "SDLL1DLR400GCCA1". Created two R0 VD (each VD consist of 8 SSDs) using MegaRaid Ventura series adapter. fio script - numactl -N 1 fio 2vd.fio --bs=4k --iodepth=128 -rw=randread --group_report --ioscheduler=none --numjobs=4 | v4.14.38-stable | patched v4.14.38-stable | mq-none | mq-none --------------------------------------------------------------------- randread "iops" | 1597k | 1377k Below is perf tool report without patch set. ( Looks like lock contention is causing this drop, so provided relevant snippet) - 3.19% 2.89% fio [kernel.vmlinux] [k] _raw_spin_lock - 2.43% io_submit - 2.30% entry_SYSCALL_64 - do_syscall_64 - 2.18% do_io_submit - 1.59% blk_finish_plug - 1.59% blk_flush_plug_list - 1.59% blk_mq_flush_plug_list - 1.00% __blk_mq_delay_run_hw_queue - 0.99% blk_mq_sched_dispatch_requests - 0.63% blk_mq_dispatch_rq_list 0.60% scsi_queue_rq - 0.57% blk_mq_sched_insert_requests - 0.56% blk_mq_insert_requests 0.51% _raw_spin_lock Below is perf tool report after applying patch set. - 4.10% 3.51% fio [kernel.vmlinux] [k] _raw_spin_lock - 3.09% io_submit - 2.97% entry_SYSCALL_64 - do_syscall_64 - 2.85% do_io_submit - 2.35% blk_finish_plug - 2.35% blk_flush_plug_list - 2.35% blk_mq_flush_plug_list - 1.83% __blk_mq_delay_run_hw_queue - 1.83% __blk_mq_run_hw_queue - 1.83% blk_mq_sched_dispatch_requests - 1.82% blk_mq_do_dispatch_ctx - 1.14% blk_mq_dequeue_from_ctx - 1.11% dispatch_rq_from_ctx 1.03% _raw_spin_lock 0.50% blk_mq_sched_insert_requests Let me know if you want more data or is this something a known implication of patch-set ? Thanks, Kashyap