On 7/5/19 3:09 PM, Dennis Zhou wrote: > The iolatency controller is based on rq_qos. It increments on > rq_qos_throttle() and decrements on either rq_qos_cleanup() or > rq_qos_done_bio(). a3fb01ba5af0 fixes the double accounting issue where > blk_mq_make_request() may call both rq_qos_cleanup() and > rq_qos_done_bio() on REQ_NO_WAIT. So checking STS_AGAIN prevents the > double decrement. > > The above works upstream as the only way we can get STS_AGAIN is from > blk_mq_get_request() failing. The STS_AGAIN handling isn't a real > problem as bio_endio() skipping only happens on reserved tag allocation > failures which can only be caused by driver bugs and already triggers > WARN. > > However, the fix creates a not so great dependency on how STS_AGAIN can > be propagated. Internally, we (Facebook) carry a patch that kills read > ahead if a cgroup is io congested or a fatal signal is pending. This > combined with chained bios progagate their bi_status to the parent is > not already set can can cause the parent bio to not clean up properly > even though it was successful. This consequently leaks the inflight > counter and can hang all IOs under that blkg. > > To nip the adverse interaction early, this removes the rq_qos_cleanup() > callback in iolatency in favor of cleaning up always on the > rq_qos_done_bio() path. Looks good, applied for 5.3. Thanks Dennis. -- Jens Axboe