> Il giorno 31 mar 2020, alle ore 08:17, Weiping Zhang <zwp10758@xxxxxxxxx> ha scritto: > >>> On the driver implementation, the number of module parameters being >>> added here is problematic. We already have 2 special classes of queues, >>> and defining this at the module level is considered too coarse when >>> the system has different devices on opposite ends of the capability >>> spectrum. For example, users want polled queues for the fast devices, >>> and none for the slower tier. We just don't have a good mechanism to >>> define per-controller resources, and more queue classes will make this >>> problem worse. >>> >> We can add a new "string" module parameter, which contains a model number, >> in most cases, the save product with a common prefix model number, so >> in this way >> nvme can distinguish the different performance devices(hign or low end). >> Before create io queue, nvme driver can get the device's Model number(40 Bytes), >> then nvme driver can compare device's model number with module parameter, to >> decide how many io queues for each disk; >> >> /* if model_number is MODEL_ANY, these parameters will be applied to >> all nvme devices. */ >> char dev_io_queues[1024] = "model_number=MODEL_ANY, >> poll=0,read=0,wrr_low=0,wrr_medium=0,wrr_high=0,wrr_urgent=0"; >> /* these paramters only affect nvme disk whose model number is "XXX" */ >> char dev_io_queues[1024] = "model_number=XXX, >> poll=1,read=2,wrr_low=3,wrr_medium=4,wrr_high=5,wrr_urgent=0;"; >> >> struct dev_io_queues { >> char model_number[40]; >> unsigned int poll; >> unsgined int read; >> unsigned int wrr_low; >> unsigned int wrr_medium; >> unsigned int wrr_high; >> unsigned int wrr_urgent; >> }; >> >> We can use these two variable to store io queue configurations: >> >> /* default values for the all disk, except whose model number is not >> in io_queues_cfg */ >> struct dev_io_queues io_queues_def = {}; >> >> /* user defined values for a specific model number */ >> struct dev_io_queues io_queues_cfg = {}; >> >> If we need multiple configurations( > 2), we can also extend >> dev_io_queues to support it. >> > > Hi Maintainers, > > If we add patch to support these queue count at controller level, > instead moudle level, > shall we add WRR ? > > Recently I do some cgroup io weight testing, > https://github.com/dublio/iotrack/wiki/cgroup-io-weight-test > I think a proper io weight policy > should consider high weight cgroup's iops, latency and also take whole > disk's throughput > into account, that is to say, the policy should do more carfully trade > off between cgroup's > IO performance and whole disk's throughput. I know one policy cannot > do all things perfectly, > but from the test result nvme-wrr can work well. > > From the following test result, nvme-wrr work well for both cgroup's > latency, iops, and whole > disk's throughput. > > Notes: > blk-iocost: only set qos.model, not set percentage latency. > nvme-wrr: set weight by: > h=64;m=32;l=8;ab=0; nvme set-feature /dev/nvme1n1 -f 1 -v $(printf > "0x%x\n" $(($ab<<0|$l<<8|$m<<16|$h<<24))) > echo "$major:$minor high" > /sys/fs/cgroup/test1/io.wrr > echo "$major:$minor low" > /sys/fs/cgroup/test2/io.wrr > > > Randread vs Randread: > cgroup.test1.weight : cgroup.test2.weight = 8 : 1 > high weight cgroup test1: randread, fio: numjobs=8, iodepth=32, bs=4K > low weight cgroup test2: randread, fio: numjobs=8, iodepth=32, bs=4K > > test case bw iops rd_avg_lat wr_avg_lat > rd_p99_lat wr_p99_lat > ======================================================================================= > bfq_test1 767226 191806 1333.30 0.00 > 536.00 0.00 > bfq_test2 94607 23651 10816.06 0.00 > 610.00 0.00 > iocost_test1 1457718 364429 701.76 0.00 > 1630.00 0.00 > iocost_test2 1466337 366584 697.62 0.00 > 1613.00 0.00 > none_test1 1456585 364146 702.22 0.00 > 1646.00 0.00 > none_test2 1463090 365772 699.12 0.00 > 1613.00 0.00 > wrr_test1 2635391 658847 387.94 0.00 > 1236.00 0.00 > wrr_test2 365428 91357 2801.00 0.00 > 5537.00 0.00 > > https://github.com/dublio/iotrack/wiki/cgroup-io-weight-test#215-summary-fio-output > > Glad to see that BFQ meets weights. Sad to see how it is suffering in terms of IOPS on your system. Good job with your scheduler! However, as for I/O control, the hard-to-control cases are not the ones with constantly-full deep queues. BFQ complexity stems from the need to control also the tough cases. An example is sync I/O with I/O depth one against async I/O. On the other hand, those use cases may not be of interest for your scheduler. Thanks, Paolo > Randread vs Seq Write: > cgroup.test1.weight : cgroup.test2.weight = 8 : 1 > high weight cgroup test1: randread, fio: numjobs=8, iodepth=32, bs=4K > low weight cgroup test2: seq write, fio: numjobs=1, iodepth=32, bs=256K > > test case bw iops rd_avg_lat wr_avg_lat > rd_p99_lat wr_p99_lat > ======================================================================================= > bfq_test1 814327 203581 1256.19 0.00 593.00 0.00 > bfq_test2 104758 409 0.00 78196.32 0.00 > 1052770.00 > iocost_test1 270467 67616 3784.02 0.00 9371.00 0.00 > iocost_test2 1541575 6021 0.00 5313.02 0.00 > 6848.00 > none_test1 271708 67927 3767.01 0.00 9502.00 0.00 > none_test2 1541951 6023 0.00 5311.50 0.00 > 6848.00 > wrr_test1 775005 193751 1320.17 0.00 4112.00 0.00 > wrr_test2 1198319 4680 0.00 6835.30 0.00 > 8847.00 > > > https://github.com/dublio/iotrack/wiki/cgroup-io-weight-test#225-summary-fio-output > > Thanks > Weiping