On 6/21/22 19:24, Jan Kara wrote: > Hello, > > This is the fourth revision of my patches fixing IO priority handling in the > block layer. Hey, I ran some tests with RT IO class on an ATA HDD with NCQ priority. Results look good: randread: (groupid=0, jobs=1): err= 0: pid=1862: Thu Jun 23 09:22:42 2022 read: IOPS=152, BW=19.1MiB/s (20.0MB/s)(11.1GiB/592929msec) slat (usec): min=37, max=3280, avg=174.96, stdev=120.90 clat (msec): min=7, max=918, avg=156.95, stdev=149.43 lat (msec): min=7, max=918, avg=157.13, stdev=149.43 clat prio 0/0 (msec): min=7, max=918, avg=171.68, stdev=150.48 clat prio 1/0 (msec): min=8, max=166, avg=24.64, stdev= 5.93 clat percentiles (msec): | 1.00th=[ 15], 5.00th=[ 20], 10.00th=[ 23], 20.00th=[ 32], | 30.00th=[ 51], 40.00th=[ 77], 50.00th=[ 108], 60.00th=[ 146], | 70.00th=[ 194], 80.00th=[ 262], 90.00th=[ 372], 95.00th=[ 477], | 99.00th=[ 659], 99.50th=[ 701], 99.90th=[ 793], 99.95th=[ 818], | 99.99th=[ 885] clat prio 0/0 (89.99% of IOs) percentiles (msec): | 1.00th=[ 15], 5.00th=[ 21], 10.00th=[ 27], 20.00th=[ 46], | 30.00th=[ 68], 40.00th=[ 94], 50.00th=[ 126], 60.00th=[ 163], | 70.00th=[ 213], 80.00th=[ 279], 90.00th=[ 388], 95.00th=[ 489], | 99.00th=[ 659], 99.50th=[ 709], 99.90th=[ 793], 99.95th=[ 827], | 99.99th=[ 885] clat prio 1/0 (10.01% of IOs) percentiles (msec): | 1.00th=[ 14], 5.00th=[ 17], 10.00th=[ 18], 20.00th=[ 20], | 30.00th=[ 22], 40.00th=[ 23], 50.00th=[ 25], 60.00th=[ 26], | 70.00th=[ 28], 80.00th=[ 30], 90.00th=[ 33], 95.00th=[ 35], | 99.00th=[ 40], 99.50th=[ 42], 99.90th=[ 47], 99.95th=[ 50], | 99.99th=[ 167] Clearly significantly lower tail latencies for the 10% of IOs with class RT (1/0), as expected. This is with "none" scheduler at QD=24 (128K random read). Feel free to add: Tested-by: Damien Le Moal <damien.lemoal@xxxxxxxxxxxxxxxxxx> > > Changes since v3: > * Added Reviewed-by tags from Damien > * Fixed build failure without CONFIG_BLOCK > * Separated refactoring of get_current_ioprio() into a separate patch > > Changes since v2: > * added some comments to better explain things > * changed handling of ioprio_get(2) > * a few small tweaks based on Damien's feedback > > Original cover letter: > Recently, I've been looking into 10% regression reported by our performance > measurement infrastructure in reaim benchmark that was bisected down to > 5a9d041ba2f6 ("block: move io_context creation into where it's needed"). This > didn't really make much sense and it took me a while to understand this but the > culprit is actually in even older commit e70344c05995 ("block: fix default IO > priority handling") and 5a9d041ba2f6 just made the breakage visible. > Essentially the problem was that after these commits some IO was queued with IO > priority class IOPRIO_CLASS_BE while other IO was queued with IOPRIO_CLASS_NONE > and as a result they could not be merged together resulting in performance > regression. I think what commit e70344c05995 ("block: fix default IO priority > handling") did is actually broken not only because of this performance > regression but because of other reasons as well (see changelog of patch 3/8 for > details). Besides this problem, there are multiple other inconsistencies in the > IO priority handling throughout the block stack we have identified when > discussing this with Damien Le Moal. So this patch set aims at fixing all these > various issues. > > Note that there are a few choices I've made I'm not 100% sold on. In particular > the conversion of blk-ioprio from rqos is somewhat disputable since it now > incurs a cost similar to blk-throttle in the bio submission fast path (need to > load bio->bi_blkg->pd[ioprio_policy.plid]). If people think the removed > boilerplate code is not worth the cost, I can certainly go via the "additional > rqos hook" path. > > Honza > Previous versions: > Link: http://lore.kernel.org/r/20220601132347.13543-1-jack@xxxxxxx # v1 > Link: http://lore.kernel.org/r/20220615160437.5478-1-jack@xxxxxxx # v2 > Link: http://lore.kernel.org/r/20220620160726.19798-1-jack@xxxxxxx # v3 -- Damien Le Moal Western Digital Research