On Thu, Jan 11, 2024 at 09:17:41AM -0700, Jens Axboe wrote: > On 1/11/24 9:14 AM, Christoph Hellwig wrote: > > On Thu, Jan 11, 2024 at 09:12:23AM -0700, Jens Axboe wrote: > >> On 1/11/24 6:57 AM, Christoph Hellwig wrote: > >>> q_usage_counter is the only thing preventing us from the limits changing > >>> under us in __bio_split_to_limits, but blk_mq_submit_bio doesn't hold it. > >>> > >>> Change __submit_bio to always acquire the q_usage_counter counter before > >>> branching out into bio vs request based helper, and let blk_mq_submit_bio > >>> tell it if it consumed the reference by handing it off to the request. > >> > >> This causes hangs for me on shutdown/reset: > > > >> which seems to indicate that a reference is being leaked. Haven't poked > >> any further at it, I'll drop these two for now. > > > > Can you send me your .config? > > Don't think it's .config related, hit it on my test box and then in my > vm as well. It's likely due to batched allocations, would be my guess. > I can start/halt the vm fine with just a boot, but if I do: Yupp, cached rqs it was. The incremental patch below fixes it. Can you add some cached request testing to blktests so that this gets covered by default? diff --git a/block/blk-mq.c b/block/blk-mq.c index 421db29535ba50..39eb4b99320c11 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -2967,8 +2967,14 @@ bool blk_mq_submit_bio(struct bio *bio) if (rq && rq->q == q) { if (blk_mq_attempt_bio_merge(q, bio, nr_segs)) return false; - if (blk_mq_use_cached_rq(rq, plug, bio)) + if (blk_mq_use_cached_rq(rq, plug, bio)) { + /* + * We're using the reference already owned by + * rq from here on. + */ + blk_queue_exit(q); goto has_rq; + } } }