Weiping Zhang <zhangweiping@xxxxxxxxxxxxxx> 于2019年6月24日周一 下午10:34写道: > > Hi, > > This series try to add Weighted Round Robin for block cgroup and nvme > driver. When multiple containers share a single nvme device, we want > to protect IO critical container from not be interfernced by other > containers. We add blkio.wrr interface to user to control their IO > priority. The blkio.wrr accept five level priorities, which contains > "urgent", "high", "medium", "low" and "none", the "none" is used for > disable WRR for this cgroup. > > The first patch add an WRR infrastucture for block cgroup. > > We add extra four hareware contexts at blk-mq layer, > HCTX_TYPE_WRR_URGETN/HIGH/MEDIUM/LOW to allow device driver maps > different hardsware queues to dirrenct hardware context. > > The second patch add a nvme_ctrl_ops named get_ams to get the expect > Arbitration Mechanism Selected, now this series only support nvme-pci. > This operations will check both CAP.AMS and nvme-pci wrr queue count, > to decide enable WRR or RR. > > The third patch rename write_queues module parameter to read_queues, > that can simplify the calculation the number of defaut,read,poll,wrr > queue. > > The fourth patch skip the empty affinity set, because nvme may have > 7 affinity sets, and some affinity set may be empty. > > The last patch add support nvme-pci Weighted Round Robin with Urgent > Priority Class, we add four module paranmeters as follow: > wrr_urgent_queues > wrr_high_queues > wrr_medium_queues > wrr_low_queues > nvme-pci will set CC.AMS=001b, if CAP.AMS[17]=1 and wrr_xxx_queues > larger than 0. nvme driver will split hardware queues base on the > read/pool/wrr_xxx_queues, then set proper value for Queue Priority > (QPRIO) in DWORD11. > > fio test: > > CPU: Intel(R) Xeon(R) Platinum 8160 CPU @ 2.10GHz > NVME: Intel SSDPE2KX020T8 P4510 2TB > > [root@tmp-201812-d1802-818396173 low]# nvme show-regs /dev/nvme0n1 > cap : 2078030fff > version : 10200 > intms : 0 > intmc : 0 > cc : 460801 > csts : 1 > nssr : 0 > aqa : 1f001f > asq : 5f7cc08000 > acq : 5f5ac23000 > cmbloc : 0 > cmbsz : 0 > > Run fio-1, fio-2, fio-3 in parallel, > > For RR(round robin) these three fio nearly get same iops or bps, > if we set blkio.wrr for different priority, the WRR "high" will > get more iops/bps than "medium" and "low". > > > > RR: > fio-1: echo "259:0 none" > /sys/fs/cgroup/blkio/high/blkio.wrr > fio-2: echo "259:0 none" > /sys/fs/cgroup/blkio/medium/blkio.wrr > fio-3: echo "259:0 none" > /sys/fs/cgroup/blkio/low/blkio.wrr > > WRR: > fio-1: echo "259:0 high" > /sys/fs/cgroup/blkio/high/blkio.wrr > fio-2: echo "259:0 medium" > /sys/fs/cgroup/blkio/medium/blkio.wrr > fio-3: echo "259:0 low" > /sys/fs/cgroup/blkio/low/blkio.wrr > > rwtest=randread > fio --bs=4k --ioengine=libaio --iodepth=32 --filename=/dev/nvme0n1 --direct=1 --runtime=60 --numjobs=8 --rw=$rwtest --name=test$1 --group_reporting > > Randread 4K RR WRR > ------------------------------------------------------- > fio-1: 220 k 395 k > fio-2: 220 k 197 k > fio-3: 220 k 66 k > > rwtest=randwrite > fio --bs=4k --ioengine=libaio --iodepth=32 --filename=/dev/nvme0n1 --direct=1 --runtime=60 --numjobs=8 --rw=$rwtest --name=test$1 --group_reporting > > Randwrite 4K RR WRR > ------------------------------------------------------- > fio-1: 150 k 295 k > fio-2: 150 k 148 k > fio-3: 150 k 51 k > > rwtest=read > fio --bs=512k --ioengine=libaio --iodepth=32 --filename=/dev/nvme0n1 --direct=1 --runtime=60 --numjobs=8 --rw=$rwtest --name=test$1 --group_reporting > > read 512K RR WRR > ------------------------------------------------------- > fio-1: 963 MiB/s 1704 MiB/s > fio-2: 950 MiB/s 850 MiB/s > fio-3: 961 MiB/s 284 MiB/s > > rwtest=read > fio --bs=512k --ioengine=libaio --iodepth=32 --filename=/dev/nvme0n1 --direct=1 --runtime=60 --numjobs=8 --rw=$rwtest --name=test$1 --group_reporting > > write 512K RR WRR > ------------------------------------------------------- > fio-1: 890 MiB/s 1150 MiB/s > fio-2: 871 MiB/s 595 MiB/s > fio-3: 895 MiB/s 188 MiB/s > > > Changes since V2: > * drop null_blk related patch, which adds a new NULL_Q_IRQ_WRR to > simulte nvme wrr policy > * add urgent tagset map for nvme driver > * fix some problem in V2, suggested by Minwoo > > Changes since V1: > * reorder HCTX_TYPE_POLL to the last one to adopt nvme driver easily. > * add support WRR(Weighted Round Robin) for nvme driver > > Weiping Zhang (5): > block: add weighted round robin for blkcgroup > nvme: add get_ams for nvme_ctrl_ops > nvme-pci: rename module parameter write_queues to read_queues > genirq/affinity: allow driver's discontigous affinity set > nvme: add support weighted round robin queue > > block/blk-cgroup.c | 89 ++++++++++++++++ > block/blk-mq-debugfs.c | 4 + > block/blk-mq-sched.c | 6 +- > block/blk-mq-tag.c | 4 +- > block/blk-mq-tag.h | 2 +- > block/blk-mq.c | 12 ++- > block/blk-mq.h | 20 +++- > block/blk.h | 2 +- > drivers/nvme/host/core.c | 9 +- > drivers/nvme/host/nvme.h | 2 + > drivers/nvme/host/pci.c | 246 ++++++++++++++++++++++++++++++++++++--------- > include/linux/blk-cgroup.h | 2 + > include/linux/blk-mq.h | 14 +++ > include/linux/interrupt.h | 2 +- > include/linux/nvme.h | 3 + > kernel/irq/affinity.c | 4 + > 16 files changed, 362 insertions(+), 59 deletions(-) > Hi Jens, Would you give some comments for this series. Thanks > -- > 2.14.1 >