On Mon, Jul 03, 2017 at 03:46:34PM +0300, Max Gurtovoy wrote: > > > On 7/3/2017 3:03 PM, Ming Lei wrote: > > On Mon, Jul 03, 2017 at 01:07:44PM +0300, Sagi Grimberg wrote: > > > Hi Ming, > > > > > > > Yeah, the above change is correct, for any canceling requests in this > > > > way we should use blk_mq_quiesce_queue(). > > > > > > I still don't understand why should blk_mq_flush_busy_ctxs hit a NULL > > > deref if we don't touch the tagset... > > > > Looks no one mentioned the steps for reproduction, then it isn't easy > > to understand the related use case, could anyone share the steps for > > reproduction? > > Hi Ming, > I create 500 ns per 1 subsystem (using with CX4 target and C-IB initiator > but also saw it in CX5 vs. CX5 setup). > The null deref happens when I remove all configuration in the target (1 port > 1 subsystem and 500 namespaces and nvmet modules unload) during traffic to 1 > nvme device/ns from the intiator. > I get Null deref in blk_mq_flush_busy_ctxs function that calls > sbitmap_for_each_set in the initiator. seems like the "struct sbitmap_word > *word = &sb->map[i];" is null. It's actually might be not null in the > beginning of the func and become null during running the while loop there. So looks it is still a normal release in initiator. Per my experience, without quiescing queue before blk_mq_tagset_busy_iter() for canceling requests, request double free can be caused: one submitted req in .queue_rq can completed in blk_mq_end_request(), meantime it can be completed in nvme_cancel_request(). That is why we have to quiescing queue first before canceling request in this way. Except for NVMe, looks NBD and mtip32xx need fix too. This way might cause blk_cleanup_queue() to complete early, then NULL deref can be triggered in blk_mq_flush_busy_ctxs(). But in my previous debug in PCI NVMe, this wasn't seen yet. It should have been verified if the above is true by adding some debug message inside blk_cleanup_queue(). Thanks, Ming