Hi Tejun, thank you very much for this extra information, I'll try the configuration you suggest. In this respect, is this still the branch to use https://kernel.googlesource.com/pub/scm/linux/kernel/git/tj/cgroup/+/refs/heads/review-iocost-v2 also after the issue spotted two days ago [1]? Thanks, Paolo [1] https://lkml.org/lkml/2019/8/29/910 > Il giorno 31 ago 2019, alle ore 08:53, Tejun Heo <tj@xxxxxxxxxx> ha scritto: > > Hello, Paolo. > > On Thu, Aug 22, 2019 at 10:58:22AM +0200, Paolo Valente wrote: >> Ok, I tried with the parameters reported for a SATA SSD: >> >> rpct=95.00 rlat=10000 wpct=95.00 wlat=20000 min=50.00 max=400.00 > > Sorry, I should have explained it with a lot more details. > > There are two things - the cost model and qos params. The default SSD > cost model parameters are derived by averaging a number of mainstream > SSD parameters. As a ballpark, this can be good enough because while > the overall performance varied quite a bit from one ssd to another, > the relative cost of different types of IOs wasn't drastically > different. > > However, this means that the performance baseline can easily be way > off from 100% depending on the specific device in use. In the above, > you're specifying min/max which limits how far the controller is > allowed to adjust the overall cost estimation. 50% and 400% are > numbers which may make sense if the cost model parameter is expected > to fall somewhere around 100% - ie. if the parameters are for that > specific device. > > In your script, you're using default model params but limiting vrate > range. It's likely that your device is significantly slower than what > the default parameters are expecting. However, because min vrate is > limited to 50%, it doesn't throttle below 50% of the estimated cost, > so if the device is significantly slower than that, nothing gets > controlled. > >> and with a simpler configuration [1]: one target doing random reads > > And without QoS latency targets, the controller is purely going by > queue depth depletion which works fine for many usual workloads such > as larger reads and writes but isn't likely to serve low-concurrency > latency-sensitive IOs well. > >> and only four interferers doing sequential reads, with all the >> processes (groups) having the same weight. >> >> But there seemed to be little or no control on I/O, because the target >> got only 1.84 MB/s, against 1.15 MB/s without any control. >> >> So I tried with rlat=1000 and rlat=100. > > And this won't do anything as all rlat/wlat does is regulating how the > overall vrate should be adjusted and it's being min'd at 50%. > >> Control did improve, with same results for both values of rlat. The >> problem is that these results still seem rather bad, both in terms of >> throughput guaranteed to the target and in terms of total throughput. >> Here are results compared with BFQ (throughputs measured in MB/s): >> >> io.weight BFQ >> target's throughput 3.415 6.224 >> total throughput 159.14 321.375 > > So, what should have been configured is something like > > $ echo '8:0 enable=1 rpct=95 rlat=10000 wpct=95 wlat=20000' > /sys/fs/cgroup/io.cost.qos > > which just says "target 10ms p(95) read latency and 20ms p(95) write > latency" without putting any restrictions on vrate range. > > With that, I got the following on Micron_1100_MTFDDAV256TBN which is a > pretty old 256GB SATA drive. > > Aggregated throughput: > min max avg std_dev conf99% > 266.73 275.71 271.38 4.05144 45.7635 > Interfered total throughput: > min max avg std_dev > 9.608 13.008 10.941 0.664938 > > During the run, iocost-monitor.py looked like the following. > > sda RUN per=40ms cur_per=2074.351:v1008.844 busy= +0 vrate= 59.85% params=ssd_dfl(CQ) > active weight hweight% inflt% del_ms usages% > InterfererGroup0 * 100/ 100 22.94/ 20.00 0.00 0*000 023:023:023 > InterfererGroup1 * 100/ 100 22.94/ 20.00 0.00 0*000 023:023:023 > InterfererGroup2 * 100/ 100 22.94/ 20.00 0.00 0*000 025:023:021 > InterfererGroup3 * 100/ 100 22.94/ 20.00 0.00 0*000 023:023:023 > interfered * 36/ 100 8.26/ 20.00 0.42 0*000 003:004:004 > > Note that interfered is reported to only use 3-4% of the disk capacity > while configured to consume 20%. This is because with single > concurrency 4k randread job, its ability to consume IO capacity is > limited by the completion latency. > > 10ms is pretty generous (ie. more work-conserving) target for SSDs. > Let's say we're willing to tighten it to trade off total work for > tighter latency. > > $ echo '8:0 enable=1 rpct=95 rlat=2500 wpct=95 wlat=5000' > /sys/fs/cgroup/io.cost.qos > > Aggregated throughput: > min max avg std_dev conf99% > 147.06 172.18 154.608 11.783 133.096 > Interfered total throughput: > min max avg std_dev > 17.992 19.32 18.698 0.313105 > > and the monitoring output > > sda RUN per=10ms cur_per=2927.152:v1556.138 busy= -2 vrate= 34.74% params=ssd_dfl(CQ) > active weight hweight% inflt% del_ms usages% > InterfererGroup0 * 100/ 100 20.00/ 20.00 386.11 0*000 070:020:020 > InterfererGroup1 * 100/ 100 20.00/ 20.00 386.11 0*000 070:020:020 > InterfererGroup2 * 100/ 100 20.00/ 20.00 386.11 0*000 070:020:020 > InterfererGroup3 * 100/ 100 20.00/ 20.00 0.00 0*000 020:020:020 > interfered * 100/ 100 20.00/ 20.00 1.21 0*000 010:014:017 > > The followings happened. > > * The vrate is now hovering way lower. The device is now doing less > total work to acheive tighter completion latencies. > > * The overall throughput dropped but interfered's utilization is now > significantly higher along with its bandwidth from lower completion > latencies. > > For reference: > > [Disabled] > > Aggregated throughput: > min max avg std_dev conf99% > 493.98 511.37 502.808 9.52773 107.621 > Interfered total throughput: > min max avg std_dev > 0.056 0.304 0.107 0.0691052 > > [Enabled, no QoS config] > > Aggregated throughput: > min max avg std_dev conf99% > 429.07 449.59 437.597 8.64952 97.7015 > Interfered total throughput: > min max avg std_dev > 0.456 3.12 1.08 0.774318 > > Thanks. > > -- > tejun