I forgot to suggest that this might have to do with md0_raid5 process. The process has to take care of RAID parity for both processes (streaming daemon and fio). By default it stays in the root cgroup which means that RAID-related I/O will be unprioritized even for processes in the prio cgroup, this might be introducing delays in the I/O. Otherwise I cannot put the md0_raid5 process in the prio cgroup either because that would have RAID-related I/O from all other processes stealing disk time from priority processes. On Fri, Nov 29, 2013 at 9:06 AM, Martin Boutin <martboutin@xxxxxxxxx> wrote: > Hello list, > > Today I was trying to figure out how to get block I/O prioritization > working for a certain process. The process is a streaming server that > reads a big file stored in a filesystem (xfs) on top of a RAID5 > configuration using 3 disks, using O_DIRECT. > > I'm setting up cgroups this way: > $ echo 1000 > /sys/fs/cgroup/blkio/prio/blkio.weight > $ echo 10 > /sys/fs/cgroup/blkio/blkio.leaf_weight > > meaning that all the tasks in the prio cgroup will have unconstrained > access time to the disk, while all the other tasks will have their > disk access time weighted by a factor. > > If I ignore the RAID5 setup, create a XFS filesystem on /dev/sdb2, > mount it on /data and put my streaming daemon in the prio cgroup and > run the daemon by streaming around 250MiB/s of data, while I launch > fio with disk I/O intensive tasks. For a period of 5 minutes, the > streaming deamon had to stop streaming in about 5 times to rebuffer. > > Now, if I consider the same scenario but using the RAID5 device and > letting the daemon stream 500MiB/s of data (because the RAID has > around twice the throughput of a single drive), after a period of 5 > minutes the streaming daemon had to stop streaming in about 50 times! > This is 10 times more than the single drive case. > > While streaming, I observed both blkio.sectors and blkio.io_queued for > both cgroups (the root node and prio). If only the streaming daemon is > run (therefore fio is stopped), the sector count in prio/blkio.sectors > increases while (root)/blkio.sectors does not. This confirms the > streaming daemon is correctly identified as in the prio cgroup. > Then, while both the streaming daemon and fio run, observing io_queued > shows that for the root cgroup there is about 50 queued request in > total (in average), while for the prio cgroup there is only one > ocasional delayed request from time to time. > > $ uname -a > Linux haswell1 3.10.10 #9 SMP PREEMPT Fri Nov 29 11:38:20 CET 2013 > i686 GNU/Linux > > Any ideas? > > Thanks, > -- > Martin Boutin -- Martin Boutin -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html