Re: high overhead of functions blkg_*stats_* in bfq

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



+Ulf Hansson, Mark Brown, Linus Walleij

> Il giorno 17 ott 2017, alle ore 12:11, Paolo Valente <paolo.valente@xxxxxxxxxx> ha scritto:
> 
> Hi Tejun, all,
> in our work for reducing bfq overhead, we bumped into an unexpected
> fact: the functions blkg_*stats_*, invoked in bfq to update cgroups
> statistics as in cfq, take about 40% of the total execution time of
> bfq.  This causes an additional serious slowdown on any multicore cpu,
> as most bfq functions, from which blkg_*stats_* get invoked, are
> protected by a per-device scheduler lock.  To give you an idea, on an
> Intel i7-4850HQ, and with 8 threads doing random I/O in parallel on
> null_blk (configured with 0 latency), if the update of groups stats is
> removed, then the throughput grows from 260 to 404 KIOPS.  This and
> all the other results we might share in this thread can be reproduced
> very easily with a (useful) script made by Luca Miccio [1].
> 
> We tried to understand the reason for this high overhead, and, in
> particular, to find out whether whether there was some issue that we
> could address on our own.  But the causes seem somehow substantial:
> one of the most time-consuming operations needed by some blkg_*stats_*
> functions is, e.g., find_next_bit, for which we don't see any trivial
> replacement.
> 
> So, as a first attempt to reduce this severe slowdown, we have made a
> patch that moves the invocation of blkg_*stats_* functions outside the
> critical sections protected by the bfq lock.  Still, these functions
> apparently need to be protected with the request_queue lock, because
> the group they are invoked on may otherwise disappear before or while
> these functions are executed.  Fortunately, tests run without even
> this lock have shown that the serialization caused by this lock has a
> little impact (5% of throughput reduction).  As for results, moving
> these functions outside the bfq lock does improve throughput: it
> grows, e.g., from 260 to 316 KIOPS in the above test case.  But we are
> still rather far from the optimum.
> 
> Do you have any clue about possible solutions to reduce the overhead
> of these functions?  If no relatively quick solution is available, we
> are planning to prepare, in addition to the above patch to increase
> parallelism, a further patch to give the user the possibility to
> disable stats update, so as to gain a full throughput boost of up to
> 55% (according to the tests we have run so far on a few different
> systems).
> 
> Thanks,
> Paolo
> 
> [1] https://github.com/Algodev-github/IOSpeed





[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux