On 7/5/19 2:32 PM, Dennis Zhou wrote: > The iolatency controller is based on rq_qos. It increments on > rq_qos_throttle() and decrements on either rq_qos_cleanup() or > rq_qos_done_bio(). a3fb01ba5af0 fixes the double accounting issue where > blk_mq_make_request() may call both rq_qos_cleanup() and > rq_qos_done_bio() on REQ_NO_WAIT. So checking STS_AGAIN prevents the > double decrement. > > The above works upstream as the only way we can get STS_AGAIN is from > blk_mq_get_request() failing. The STS_AGAIN handling isn't a real > problem as bio_endio() skipping only happens on reserved tag allocation > failures which can only be caused by driver bugs and already triggers > WARN. > > However, the fix creates a not so great dependency on how STS_AGAIN can > be propagated. Internally, we (Facebook) carry a patch that kills read > ahead if a cgroup is io congested or a fatal signal is pending. This > combined with chained bios progagate their bi_status to the parent is > not already set can can cause the parent bio to not clean up properly > even though it was successful. This consequently leaks the inflight > counter and can hang all IOs under that blkg. > > To nip the adverse interaction early, this removes the rq_qos_cleanup() > callback in iolatency in favor of cleaning up always on the > rq_qos_done_bio() path. > > Fixes: a3fb01ba5af0 ("blk-iolatency: only account submitted bios") > Debugged-by: Tejun Heo <tj@xxxxxxxxxx> > Debugged-by: Josef Bacik <josef@xxxxxxxxxxxxxx> > Signed-off-by: Dennis Zhou <dennis@xxxxxxxxxx> > --- > block/blk-iolatency.c | 29 +++-------------------------- > 1 file changed, 3 insertions(+), 26 deletions(-) > > diff --git a/block/blk-iolatency.c b/block/blk-iolatency.c > index e8859350ab6e..c956eebf2d97 100644 > --- a/block/blk-iolatency.c > +++ b/block/blk-iolatency.c > @@ -600,10 +600,6 @@ static void blkcg_iolatency_done_bio(struct rq_qos *rqos, struct bio *bio) > if (!blkg || !bio_flagged(bio, BIO_TRACKED)) > return; > > - /* We didn't actually submit this bio, don't account it. */ > - if (bio->bi_status == BLK_STS_AGAIN) > - return; > - > iolat = blkg_to_lat(bio->bi_blkg); > if (!iolat) > return; > @@ -622,6 +618,9 @@ static void blkcg_iolatency_done_bio(struct rq_qos *rqos, struct bio *bio) > > inflight = atomic_dec_return(&rqw->inflight); > WARN_ON_ONCE(inflight < 0); > + /* We didn't actually submit this bio, don't account for it. */ > + if (bio->bi_status == BLK_STS_AGAIN) > + goto next; > if (iolat->min_lat_nsec == 0) > goto next; > iolatency_record_time(iolat, &bio->bi_issue, now, Patch in general looks fine to me, but let's get rid of this next label, it's pretty silly. Only one use of it, why not just make it a nested if? -- Jens Axboe