Re: [PATCH v3 0/5] Add support Weighted Round Robin for blkcg and nvme

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Weiping Zhang <zhangweiping@xxxxxxxxxxxxxx> 于2019年6月24日周一 下午10:34写道:
>
> Hi,
>
> This series try to add Weighted Round Robin for block cgroup and nvme
> driver. When multiple containers share a single nvme device, we want
> to protect IO critical container from not be interfernced by other
> containers. We add blkio.wrr interface to user to control their IO
> priority. The blkio.wrr accept five level priorities, which contains
> "urgent", "high", "medium", "low" and "none", the "none" is used for
> disable WRR for this cgroup.
>
> The first patch add an WRR infrastucture for block cgroup.
>
> We add extra four hareware contexts at blk-mq layer,
> HCTX_TYPE_WRR_URGETN/HIGH/MEDIUM/LOW to allow device driver maps
> different hardsware queues to dirrenct hardware context.
>
> The second patch add a nvme_ctrl_ops named get_ams to get the expect
> Arbitration Mechanism Selected, now this series only support nvme-pci.
> This operations will check both CAP.AMS and nvme-pci wrr queue count,
> to decide enable WRR or RR.
>
> The third patch rename write_queues module parameter to read_queues,
> that can simplify the calculation the number of defaut,read,poll,wrr
> queue.
>
> The fourth patch skip the empty affinity set, because nvme may have
> 7 affinity sets, and some affinity set may be empty.
>
> The last patch add support nvme-pci Weighted Round Robin with Urgent
> Priority Class, we add four module paranmeters as follow:
>         wrr_urgent_queues
>         wrr_high_queues
>         wrr_medium_queues
>         wrr_low_queues
> nvme-pci will set CC.AMS=001b, if CAP.AMS[17]=1 and wrr_xxx_queues
> larger than 0. nvme driver will split hardware queues base on the
> read/pool/wrr_xxx_queues, then set proper value for Queue Priority
> (QPRIO) in DWORD11.
>
> fio test:
>
> CPU:    Intel(R) Xeon(R) Platinum 8160 CPU @ 2.10GHz
> NVME:   Intel SSDPE2KX020T8 P4510 2TB
>
> [root@tmp-201812-d1802-818396173 low]# nvme show-regs /dev/nvme0n1
> cap     : 2078030fff
> version : 10200
> intms   : 0
> intmc   : 0
> cc      : 460801
> csts    : 1
> nssr    : 0
> aqa     : 1f001f
> asq     : 5f7cc08000
> acq     : 5f5ac23000
> cmbloc  : 0
> cmbsz   : 0
>
> Run fio-1, fio-2, fio-3 in parallel,
>
> For RR(round robin) these three fio nearly get same iops or bps,
> if we set blkio.wrr for different priority, the WRR "high" will
> get more iops/bps than "medium" and "low".
>
>
>
> RR:
> fio-1: echo "259:0 none" > /sys/fs/cgroup/blkio/high/blkio.wrr
> fio-2: echo "259:0 none" > /sys/fs/cgroup/blkio/medium/blkio.wrr
> fio-3: echo "259:0 none" > /sys/fs/cgroup/blkio/low/blkio.wrr
>
> WRR:
> fio-1: echo "259:0 high" > /sys/fs/cgroup/blkio/high/blkio.wrr
> fio-2: echo "259:0 medium" > /sys/fs/cgroup/blkio/medium/blkio.wrr
> fio-3: echo "259:0 low" > /sys/fs/cgroup/blkio/low/blkio.wrr
>
> rwtest=randread
> fio --bs=4k --ioengine=libaio --iodepth=32 --filename=/dev/nvme0n1 --direct=1 --runtime=60 --numjobs=8 --rw=$rwtest --name=test$1 --group_reporting
>
> Randread 4K     RR              WRR
> -------------------------------------------------------
> fio-1:          220 k           395 k
> fio-2:          220 k           197 k
> fio-3:          220 k           66  k
>
> rwtest=randwrite
> fio --bs=4k --ioengine=libaio --iodepth=32 --filename=/dev/nvme0n1 --direct=1 --runtime=60 --numjobs=8 --rw=$rwtest --name=test$1 --group_reporting
>
> Randwrite 4K    RR              WRR
> -------------------------------------------------------
> fio-1:          150 k           295 k
> fio-2:          150 k           148 k
> fio-3:          150 k           51  k
>
> rwtest=read
> fio --bs=512k --ioengine=libaio --iodepth=32 --filename=/dev/nvme0n1 --direct=1 --runtime=60 --numjobs=8 --rw=$rwtest --name=test$1 --group_reporting
>
> read 512K       RR              WRR
> -------------------------------------------------------
> fio-1:          963 MiB/s       1704 MiB/s
> fio-2:          950 MiB/s       850  MiB/s
> fio-3:          961 MiB/s       284  MiB/s
>
> rwtest=read
> fio --bs=512k --ioengine=libaio --iodepth=32 --filename=/dev/nvme0n1 --direct=1 --runtime=60 --numjobs=8 --rw=$rwtest --name=test$1 --group_reporting
>
> write 512K      RR              WRR
> -------------------------------------------------------
> fio-1:          890 MiB/s       1150 MiB/s
> fio-2:          871 MiB/s       595  MiB/s
> fio-3:          895 MiB/s       188  MiB/s
>
>
> Changes since V2:
>  * drop null_blk related patch, which adds a new NULL_Q_IRQ_WRR to
>         simulte nvme wrr policy
>  * add urgent tagset map for nvme driver
>  * fix some problem in V2, suggested by Minwoo
>
> Changes since V1:
>  * reorder HCTX_TYPE_POLL to the last one to adopt nvme driver easily.
>  * add support WRR(Weighted Round Robin) for nvme driver
>
> Weiping Zhang (5):
>   block: add weighted round robin for blkcgroup
>   nvme: add get_ams for nvme_ctrl_ops
>   nvme-pci: rename module parameter write_queues to read_queues
>   genirq/affinity: allow driver's discontigous affinity set
>   nvme: add support weighted round robin queue
>
>  block/blk-cgroup.c         |  89 ++++++++++++++++
>  block/blk-mq-debugfs.c     |   4 +
>  block/blk-mq-sched.c       |   6 +-
>  block/blk-mq-tag.c         |   4 +-
>  block/blk-mq-tag.h         |   2 +-
>  block/blk-mq.c             |  12 ++-
>  block/blk-mq.h             |  20 +++-
>  block/blk.h                |   2 +-
>  drivers/nvme/host/core.c   |   9 +-
>  drivers/nvme/host/nvme.h   |   2 +
>  drivers/nvme/host/pci.c    | 246 ++++++++++++++++++++++++++++++++++++---------
>  include/linux/blk-cgroup.h |   2 +
>  include/linux/blk-mq.h     |  14 +++
>  include/linux/interrupt.h  |   2 +-
>  include/linux/nvme.h       |   3 +
>  kernel/irq/affinity.c      |   4 +
>  16 files changed, 362 insertions(+), 59 deletions(-)
>

Hi Jens,

Would you give some comments for this series.

Thanks


> --
> 2.14.1
>




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]     [Monitors]

  Powered by Linux