Hello, Paolo. On Tue, Oct 17, 2017 at 12:11:01PM +0200, Paolo Valente wrote: ... > protected by a per-device scheduler lock. To give you an idea, on an > Intel i7-4850HQ, and with 8 threads doing random I/O in parallel on > null_blk (configured with 0 latency), if the update of groups stats is > removed, then the throughput grows from 260 to 404 KIOPS. This and > all the other results we might share in this thread can be reproduced > very easily with a (useful) script made by Luca Miccio [1]. I don't think the old request_queue is ever built for multiple CPUs hitting on a mem-backed device. > We tried to understand the reason for this high overhead, and, in > particular, to find out whether whether there was some issue that we > could address on our own. But the causes seem somehow substantial: > one of the most time-consuming operations needed by some blkg_*stats_* > functions is, e.g., find_next_bit, for which we don't see any trivial > replacement. Can you point to the specific ones? I can't find find_next_bit usages in generic blkg code. > So, as a first attempt to reduce this severe slowdown, we have made a > patch that moves the invocation of blkg_*stats_* functions outside the > critical sections protected by the bfq lock. Still, these functions > apparently need to be protected with the request_queue lock, because blkgs are already protected with RCU, so RCU protection should be enough. Thanks. -- tejun