On 6/23/22 2:56 AM, Shinichiro Kawasaki wrote: > On Jun 21, 2022 / 10:07, Jens Axboe wrote: >> If rq_qos_throttle() ends up blocking, then we will have invalidated and >> flushed our current plug. Since blk_mq_get_cached_request() hasn't >> popped the cached request off the plug list just yet, we end holding a >> pointer to a request that is no longer valid. This insta-crashes with >> rq->mq_hctx being NULL in the validity checks just after. >> >> Pop the request off the cached list before doing rq_qos_throttle() to >> avoid using a potentially stale request. >> >> Fixes: 0a5aa8d161d1 ("block: fix blk_mq_attempt_bio_merge and rq_qos_throttle protection") >> Reported-by: Dylan Yudaken <dylany@xxxxxx> >> Signed-off-by: Jens Axboe <axboe@xxxxxxxxx> > > Thank you for the fix and sorry for the trouble. The patch passed my test set: > > Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@xxxxxxx> > > > BTW, may I know the workload or script which triggers the failure? I'm > interested in if we can add a blktests test case to exercises qos throttle > and prevent similar bugs in the future. Not sure if there are others, but specifically blk-iocost will do an explicit schedule() for some conditions. See the bottom of ioc_rqos_throttle(). -- Jens Axboe