On Mon, Aug 23, 2021 at 08:42:45PM +0800, Joseph Qi wrote: > Hi Ming, > > On 8/18/21 9:09 AM, Ming Lei wrote: > > is_flush_rq() is called from bt_iter()/bt_tags_iter(), and runs the > > following check: > > > > hctx->fq->flush_rq == req > > > > but the passed hctx from bt_iter()/bt_tags_iter() may be NULL because: > > > > 1) memory re-order in blk_mq_rq_ctx_init(): > > > > rq->mq_hctx = data->hctx; > > ... > > refcount_set(&rq->ref, 1); > > > > OR > > > > 2) tag re-use and ->rqs[] isn't updated with new request. > > > > Fix the issue by re-writing is_flush_rq() as: > > > > return rq->end_io == flush_end_io; > > > > which turns out simpler to follow and immune to data race since we have > > ordered WRITE rq->end_io and refcount_set(&rq->ref, 1). > > > Recently we've run into a similar crash due to NULL rq->mq_hctx in > blk_mq_put_rq_ref() on ARM, and it is a normal write request. > Since memory reorder truly exists, we may also risk other uninitialized > member accessing after this commit, at least we have to be more careful > in busy_iter_fn... > So here you don't use memory barrier before refcount_set() is for > performance consideration? Yes, also it is much simpler to check ->end_io in concept. Thanks, Ming