> > On the driver implementation, the number of module parameters being > > added here is problematic. We already have 2 special classes of queues, > > and defining this at the module level is considered too coarse when > > the system has different devices on opposite ends of the capability > > spectrum. For example, users want polled queues for the fast devices, > > and none for the slower tier. We just don't have a good mechanism to > > define per-controller resources, and more queue classes will make this > > problem worse. > > > We can add a new "string" module parameter, which contains a model number, > in most cases, the save product with a common prefix model number, so > in this way > nvme can distinguish the different performance devices(hign or low end). > Before create io queue, nvme driver can get the device's Model number(40 Bytes), > then nvme driver can compare device's model number with module parameter, to > decide how many io queues for each disk; > > /* if model_number is MODEL_ANY, these parameters will be applied to > all nvme devices. */ > char dev_io_queues[1024] = "model_number=MODEL_ANY, > poll=0,read=0,wrr_low=0,wrr_medium=0,wrr_high=0,wrr_urgent=0"; > /* these paramters only affect nvme disk whose model number is "XXX" */ > char dev_io_queues[1024] = "model_number=XXX, > poll=1,read=2,wrr_low=3,wrr_medium=4,wrr_high=5,wrr_urgent=0;"; > > struct dev_io_queues { > char model_number[40]; > unsigned int poll; > unsgined int read; > unsigned int wrr_low; > unsigned int wrr_medium; > unsigned int wrr_high; > unsigned int wrr_urgent; > }; > > We can use these two variable to store io queue configurations: > > /* default values for the all disk, except whose model number is not > in io_queues_cfg */ > struct dev_io_queues io_queues_def = {}; > > /* user defined values for a specific model number */ > struct dev_io_queues io_queues_cfg = {}; > > If we need multiple configurations( > 2), we can also extend > dev_io_queues to support it. > Hi Maintainers, If we add patch to support these queue count at controller level, instead moudle level, shall we add WRR ? Recently I do some cgroup io weight testing, https://github.com/dublio/iotrack/wiki/cgroup-io-weight-test I think a proper io weight policy should consider high weight cgroup's iops, latency and also take whole disk's throughput into account, that is to say, the policy should do more carfully trade off between cgroup's IO performance and whole disk's throughput. I know one policy cannot do all things perfectly, but from the test result nvme-wrr can work well. >From the following test result, nvme-wrr work well for both cgroup's latency, iops, and whole disk's throughput. Notes: blk-iocost: only set qos.model, not set percentage latency. nvme-wrr: set weight by: h=64;m=32;l=8;ab=0; nvme set-feature /dev/nvme1n1 -f 1 -v $(printf "0x%x\n" $(($ab<<0|$l<<8|$m<<16|$h<<24))) echo "$major:$minor high" > /sys/fs/cgroup/test1/io.wrr echo "$major:$minor low" > /sys/fs/cgroup/test2/io.wrr Randread vs Randread: cgroup.test1.weight : cgroup.test2.weight = 8 : 1 high weight cgroup test1: randread, fio: numjobs=8, iodepth=32, bs=4K low weight cgroup test2: randread, fio: numjobs=8, iodepth=32, bs=4K test case bw iops rd_avg_lat wr_avg_lat rd_p99_lat wr_p99_lat ======================================================================================= bfq_test1 767226 191806 1333.30 0.00 536.00 0.00 bfq_test2 94607 23651 10816.06 0.00 610.00 0.00 iocost_test1 1457718 364429 701.76 0.00 1630.00 0.00 iocost_test2 1466337 366584 697.62 0.00 1613.00 0.00 none_test1 1456585 364146 702.22 0.00 1646.00 0.00 none_test2 1463090 365772 699.12 0.00 1613.00 0.00 wrr_test1 2635391 658847 387.94 0.00 1236.00 0.00 wrr_test2 365428 91357 2801.00 0.00 5537.00 0.00 https://github.com/dublio/iotrack/wiki/cgroup-io-weight-test#215-summary-fio-output Randread vs Seq Write: cgroup.test1.weight : cgroup.test2.weight = 8 : 1 high weight cgroup test1: randread, fio: numjobs=8, iodepth=32, bs=4K low weight cgroup test2: seq write, fio: numjobs=1, iodepth=32, bs=256K test case bw iops rd_avg_lat wr_avg_lat rd_p99_lat wr_p99_lat ======================================================================================= bfq_test1 814327 203581 1256.19 0.00 593.00 0.00 bfq_test2 104758 409 0.00 78196.32 0.00 1052770.00 iocost_test1 270467 67616 3784.02 0.00 9371.00 0.00 iocost_test2 1541575 6021 0.00 5313.02 0.00 6848.00 none_test1 271708 67927 3767.01 0.00 9502.00 0.00 none_test2 1541951 6023 0.00 5311.50 0.00 6848.00 wrr_test1 775005 193751 1320.17 0.00 4112.00 0.00 wrr_test2 1198319 4680 0.00 6835.30 0.00 8847.00 https://github.com/dublio/iotrack/wiki/cgroup-io-weight-test#225-summary-fio-output Thanks Weiping