[PATCH 0/3] Performance enhancements

Keith Busch <keith.busch@xxxxxxxxx> · Thu, 21 Dec 2017 13:46:33 -0700

A few IO micro-optimizations for IO polling and NVMe. I'm really working
to close the performance gap with userspace drivers, and this gets me
halfway there on latency. The fastest hardware I could get measured
roundtrip read latency at 5usec with this series that was previously
measuring 5.7usec.

Note with NVMe, you really need to crank up the interrupt coalescing to
see the completion polling benefit.

Test pre-setup:

  echo performance | tee /sys/devices/system/cpu/cpufreq/policy*/scaling_governor
  echo 0 > /sys/block/nvme0n1/queue/iostats
  echo -1 > /sys/block/nvme0n1/queue/io_poll_delay
  nvme set-feature /dev/nvme0 -f 8 -v 0x4ff

fio profile:

  [global]
  ioengine=pvsync2
  rw=randread
  norandommap
  direct=1
  bs=4k
  hipri

  [hi-pri]
  filename=/dev/nvme0n1
  cpus_allowed=2

Keith Busch (3):
  nvme/pci: Start request after doorbell ring
  nvme/pci: Remove cq_vector check in IO path
  block: Polling completion performance optimization

 drivers/nvme/host/pci.c | 14 +++-----------
 fs/block_dev.c          |  5 ++++-
 2 files changed, 7 insertions(+), 12 deletions(-)

-- 
2.13.6