On 11/25/21 2:07 PM, Jens Axboe wrote: > On 11/25/21 2:05 PM, Kenneth R. Crudup wrote: >> >> On Tue, 23 Nov 2021, Jens Axboe wrote: >> >>> It looks like some missed accounting. You can just disable wbt for now, would >>> be a useful data point to see if that fixes it. Just do: >> >>> echo 0 > /sys/block/nvme0n1/queue/wbt_lat_usec >> >>> and that will disable writeback throttling on that device. >> >> It's been about 48 hours and haven't seen the issue since doing this. > > Great, thanks for verifying. From your report 5.16-rc2 has the issue, is > 5.15 fine? Can you apply this on top of 5.16-rc2 or current -git and see if it fixes it for you? diff --git a/block/blk-mq.c b/block/blk-mq.c index 8799fa73ef34..8874a63ae952 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -860,13 +860,14 @@ void blk_mq_end_request_batch(struct io_comp_batch *iob) if (iob->need_ts) __blk_mq_end_request_acct(rq, now); + rq_qos_done(rq->q, rq); + WRITE_ONCE(rq->state, MQ_RQ_IDLE); if (!refcount_dec_and_test(&rq->ref)) continue; blk_crypto_free_request(rq); blk_pm_mark_last_busy(rq); - rq_qos_done(rq->q, rq); if (nr_tags == TAG_COMP_BATCH || cur_hctx != rq->mq_hctx) { if (cur_hctx) -- Jens Axboe