Re: [PATCH 0/9 v4] block: Fix IO priority mess

Damien Le Moal <damien.lemoal@xxxxxxxxxxxxxxxxxx> · Thu, 23 Jun 2022 09:24:56 +0900

On 6/21/22 19:24, Jan Kara wrote:
> Hello,
> 
> This is the fourth revision of my patches fixing IO priority handling in the
> block layer.

Hey,

I ran some tests with RT IO class on an ATA HDD with NCQ priority.
Results look good:

randread: (groupid=0, jobs=1): err= 0: pid=1862: Thu Jun 23 09:22:42 2022
  read: IOPS=152, BW=19.1MiB/s (20.0MB/s)(11.1GiB/592929msec)
    slat (usec): min=37, max=3280, avg=174.96, stdev=120.90
    clat (msec): min=7, max=918, avg=156.95, stdev=149.43
     lat (msec): min=7, max=918, avg=157.13, stdev=149.43
    clat prio 0/0 (msec): min=7, max=918, avg=171.68, stdev=150.48
    clat prio 1/0 (msec): min=8, max=166, avg=24.64, stdev= 5.93
    clat percentiles (msec):
     |  1.00th=[   15],  5.00th=[   20], 10.00th=[   23], 20.00th=[   32],
     | 30.00th=[   51], 40.00th=[   77], 50.00th=[  108], 60.00th=[  146],
     | 70.00th=[  194], 80.00th=[  262], 90.00th=[  372], 95.00th=[  477],
     | 99.00th=[  659], 99.50th=[  701], 99.90th=[  793], 99.95th=[  818],
     | 99.99th=[  885]
    clat prio 0/0 (89.99% of IOs) percentiles (msec):
     |  1.00th=[   15],  5.00th=[   21], 10.00th=[   27], 20.00th=[   46],
     | 30.00th=[   68], 40.00th=[   94], 50.00th=[  126], 60.00th=[  163],
     | 70.00th=[  213], 80.00th=[  279], 90.00th=[  388], 95.00th=[  489],
     | 99.00th=[  659], 99.50th=[  709], 99.90th=[  793], 99.95th=[  827],
     | 99.99th=[  885]
    clat prio 1/0 (10.01% of IOs) percentiles (msec):
     |  1.00th=[   14],  5.00th=[   17], 10.00th=[   18], 20.00th=[   20],
     | 30.00th=[   22], 40.00th=[   23], 50.00th=[   25], 60.00th=[   26],
     | 70.00th=[   28], 80.00th=[   30], 90.00th=[   33], 95.00th=[   35],
     | 99.00th=[   40], 99.50th=[   42], 99.90th=[   47], 99.95th=[   50],
     | 99.99th=[  167]

Clearly significantly lower tail latencies for the 10% of IOs with class
RT (1/0), as expected. This is with "none" scheduler at QD=24 (128K random
read).

Feel free to add:

Tested-by: Damien Le Moal <damien.lemoal@xxxxxxxxxxxxxxxxxx>

> 
> Changes since v3:
> * Added Reviewed-by tags from Damien
> * Fixed build failure without CONFIG_BLOCK
> * Separated refactoring of get_current_ioprio() into a separate patch
> 
> Changes since v2:
> * added some comments to better explain things
> * changed handling of ioprio_get(2)
> * a few small tweaks based on Damien's feedback
> 
> Original cover letter:
> Recently, I've been looking into 10% regression reported by our performance
> measurement infrastructure in reaim benchmark that was bisected down to
> 5a9d041ba2f6 ("block: move io_context creation into where it's needed"). This
> didn't really make much sense and it took me a while to understand this but the
> culprit is actually in even older commit e70344c05995 ("block: fix default IO
> priority handling") and 5a9d041ba2f6 just made the breakage visible.
> Essentially the problem was that after these commits some IO was queued with IO
> priority class IOPRIO_CLASS_BE while other IO was queued with IOPRIO_CLASS_NONE
> and as a result they could not be merged together resulting in performance
> regression. I think what commit e70344c05995 ("block: fix default IO priority
> handling") did is actually broken not only because of this performance
> regression but because of other reasons as well (see changelog of patch 3/8 for
> details). Besides this problem, there are multiple other inconsistencies in the
> IO priority handling throughout the block stack we have identified when
> discussing this with Damien Le Moal. So this patch set aims at fixing all these
> various issues.
> 
> Note that there are a few choices I've made I'm not 100% sold on. In particular
> the conversion of blk-ioprio from rqos is somewhat disputable since it now
> incurs a cost similar to blk-throttle in the bio submission fast path (need to
> load bio->bi_blkg->pd[ioprio_policy.plid]).  If people think the removed
> boilerplate code is not worth the cost, I can certainly go via the "additional
> rqos hook" path.
> 
> 								Honza
> Previous versions:
> Link: http://lore.kernel.org/r/20220601132347.13543-1-jack@xxxxxxx # v1
> Link: http://lore.kernel.org/r/20220615160437.5478-1-jack@xxxxxxx # v2
> Link: http://lore.kernel.org/r/20220620160726.19798-1-jack@xxxxxxx # v3

-- 
Damien Le Moal
Western Digital Research