On Tue 03-05-16 11:34:10, Jan Kara wrote: > Yeah, once I'll hunt down that regression with old disk, I can have a look > into how writeback throttling plays together with blkio-controller. So I've tried the following script (note that you need cgroup v2 for writeback IO to be throttled): --- mkdir /sys/fs/cgroup/group1 echo 1000 >/sys/fs/cgroup/group1/io.weight dd if=/dev/zero of=/mnt/file1 bs=1M count=10000& DD1=$! echo $DD1 >/sys/fs/cgroup/group1/cgroup.procs mkdir /sys/fs/cgroup/group2 echo 100 >/sys/fs/cgroup/group2/io.weight #echo "259:65536 wbps=5000000" >/sys/fs/cgroup/group2/io.max echo "259:65536 wbps=max" >/sys/fs/cgroup/group2/io.max dd if=/dev/zero of=/mnt/file2 bs=1M count=10000& DD2=$! echo $DD2 >/sys/fs/cgroup/group2/cgroup.procs while true; do sleep 1 kill -USR1 $DD1 kill -USR1 $DD2 echo '=======================================================' done --- and watched the progress of the dd processes in different cgroups. The 1/10 weight difference has no effect with your writeback patches - the situation after one minute: 3120+1 records in 3120+1 records out 3272392704 bytes (3.3 GB) copied, 63.7119 s, 51.4 MB/s 3217+1 records in 3217+1 records out 3374010368 bytes (3.4 GB) copied, 63.5819 s, 53.1 MB/s I should add that even without your patches the progress doesn't quite correspond to the weight ratio: ... but still there is noticeable difference to cgroups with different weights. OTOH blk-throttle combines well with your patches: Limiting one cgroup to 5 M/s results in numbers like: 3883+2 records in 3883+2 records out 4072091648 bytes (4.1 GB) copied, 36.6713 s, 111 MB/s 413+0 records in 413+0 records out 433061888 bytes (433 MB) copied, 36.8939 s, 11.7 MB/s which is fine and comparable with unpatched kernel. Higher throughput number is because we do buffered writes and dd reports what it wrote into page cache. And there is no wonder blk-throttle combines fine - it throttles bios which happens before we reach writeback throttling mechanism. So I belive this demonstrates that your writeback throttling just doesn't work well with selective scheduling policy that happens below it because it can essentially lead to IO priority inversion issues... Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html