> Il giorno 13 dic 2018, alle ore 21:34, Madhav Ancha <mancha@xxxxxxxxxxxxxxxxxx> ha scritto: > > In our setup, we have a task that writes to a NVMe SSD drive using the > page cache. (using ::write os calls). This task does application level > buffering and sends big (large MBs) chunks of data to ::write call. > Each instance of the task writes upto 10Gbps of data to the NVME SSD. > > We run two instances of this task as below. > Instance 1: Using ionice -c1, we run a RT IO instance of this task. > Instance 2: We run a normal (best-effort) IO instance of this task. > > Both the write task instances compete for NVMe bandwidth. We observe > that BFQ allocates equal bandwidth to both the task instances starting > a few seconds after they start up. > > What we expected is that Instance1 (IOPRIO_CLASS_RT scheduling class) > will be granted all the bandwidth it asked for while Instance2 will be > allowed to consume the remaining bandwidth. > > Could you please help us understand how we may be able to design to > get our expected behavior. > Hi, if you do async, in-memory writes, then your task instances just dirty vm pages. Then different processes, the kworkers, will do the writeback of dirty pages asynchronously, according to the system writeback logic and configuration. kworkers have their own priority, which is likely to be the same for each such process. AFAICT this priority is not related to the priority you give to your processes. If you want to control I/O bandwidths for writes, go for direct I/O or use cgroups. In case of cgroups, consider that there is still the oddity that bfq interface parameters are non standard. We have proposed and are pushing for a solution to this problem [1]. Thanks, Paolo [1] https://lkml.org/lkml/2018/11/19/366 > Thanks