Unintuitive scheduling results using BFQ

Madhav Ancha <mancha@xxxxxxxxxxxxxxxxxx> · Thu, 13 Dec 2018 15:34:09 -0500

In our setup, we have a task that writes to a NVMe SSD drive using the
page cache. (using ::write os calls). This task does application level
buffering and sends big (large MBs) chunks of data to ::write call.
Each instance of the task writes upto 10Gbps of data to the NVME SSD.

We run two instances of this task as below.
Instance 1: Using ionice -c1, we run a RT IO instance of this task.
Instance 2: We run a normal (best-effort) IO instance of this task.

Both the write task instances compete for NVMe bandwidth. We observe
that BFQ allocates equal bandwidth to both the task instances starting
a few seconds after they start up.

What we expected is that Instance1 (IOPRIO_CLASS_RT scheduling class)
will be granted all the bandwidth it asked for while Instance2 will be
allowed to consume the remaining bandwidth.

Could you please help us understand how we may be able to design to
get our expected behavior.

Thanks