Re: Unintuitive scheduling results using BFQ

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Paolo,

    Thanks a lot for your work and your response to this email.

    Following your advise, I switched my real time application to direct I/O.
    We now have

    Task1: Using ionice -c1, we run a RT IO/ O_DIRECT task that writes
to flood the NVME drive to its capacity.
    Task2: We run a normal (best-effort) IO/ asyn(page-cache buffered)
task that writes to flood the NVME drive to its capacity.

    What we now see is that Task2 still ends up getting about 3/5th or
more of the NVMe bandwidth and Task1 ends up getting the rest of the
NVMe disk bandwidth. Could the kernel threads/buffering be
overpowering the RT priority of Task1?

    What we desire is to loosely ensure that Task1 gets as much
bandwidth as it asks for in any iteration while Task2 and the
remaining tasks share the leftover bandwidth.

    We are currently testing with these settings in BFQ.
    low_latency = 0 (to control the bandwidth allocation)
    slice_idle = 0     (we are using a fast NVMe and if Task1 does not
have any requests and control goes to Task2, it seems to make sense we
get the control back to Task1 quickly)
    timeout_sync = 1 (Task1 does application level buffering and sends
the biggest chunk of data available (in high MB's) always)

    We are unable to make the leap to cpugroups at this time Paolo. Is
there anything we can tune in BFQ or change the way we generate
traffic in Task1 to ensure that Task1 gets the bandwidth it asks for?

    A rough approximation to the Task1 and Task2 traffic we discussed
above seem to be these instantiations of dd.
    Task1: ionice -c1 -n2 /bin/dd if=/dev/zero of=/n/ssd1/ddtestRT1
bs=2M oflag=direct
    Task2: /bin/dd if=/dev/zero of=/n/ssd1/ddtestRT2   bs=2M

Thanks again Paolo,
Madhav.



On Fri, Dec 14, 2018 at 1:38 AM Paolo Valente <paolo.valente@xxxxxxxxxx> wrote:
>
>
>
> > Il giorno 13 dic 2018, alle ore 21:34, Madhav Ancha <mancha@xxxxxxxxxxxxxxxxxx> ha scritto:
> >
> > In our setup, we have a task that writes to a NVMe SSD drive using the
> > page cache. (using ::write os calls). This task does application level
> > buffering and sends big (large MBs) chunks of data to ::write call.
> > Each instance of the task writes upto 10Gbps of data to the NVME SSD.
> >
> > We run two instances of this task as below.
> > Instance 1: Using ionice -c1, we run a RT IO instance of this task.
> > Instance 2: We run a normal (best-effort) IO instance of this task.
> >
> > Both the write task instances compete for NVMe bandwidth. We observe
> > that BFQ allocates equal bandwidth to both the task instances starting
> > a few seconds after they start up.
> >
> > What we expected is that Instance1 (IOPRIO_CLASS_RT scheduling class)
> > will be granted all the bandwidth it asked for while Instance2 will be
> > allowed to consume the remaining bandwidth.
> >
> > Could you please help us understand how we may be able to design to
> > get our expected behavior.
> >
>
> Hi,
> if you do async, in-memory writes, then your task instances just dirty
> vm pages.  Then different processes, the kworkers, will do the
> writeback of dirty pages asynchronously, according to the system
> writeback logic and configuration.  kworkers have their own priority,
> which is likely to be the same for each such process.  AFAICT this
> priority is not related to the priority you give to your processes.
>
> If you want to control I/O bandwidths for writes, go for direct I/O or
> use cgroups.  In case of cgroups, consider that there is still the
> oddity that bfq interface parameters are non standard.  We have
> proposed and are pushing for a solution to this problem [1].
>
> Thanks,
> Paolo
>
> [1] https://lkml.org/lkml/2018/11/19/366
>
> > Thanks
>



[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux