Re: [RFC Patch] dm: make sure to wait for all dispatched requests in __dm_suspend()

Martin Wilck <mwilck@xxxxxxxx> · Wed, 20 Mar 2024 10:51:50 +0100

On Wed, 2024-03-20 at 11:03 +0800, Ming Lei wrote:
> On Tue, Mar 19, 2024 at 04:41:26PM +0100, Martin Wilck wrote:
> > 
> > What we know for sure is that there was a bad dm_target reference
> > in
> > (struct dm_rq_target_io *tio)->ti:
> > 
> > crash> struct -x dm_rq_target_io c00000245ca90128
> > struct dm_rq_target_io {
> >   md = 0xc0000031c66a4000,
> >   ti = 0xc0080000020d0080 <fscache_object_list_lock+665632>,
> > 
> > crash> struct -x dm_target  0xc0080000020d0080
> > struct dm_target struct: invalid kernel virtual address:
> > c0080000020d0080  type: "gdb_readmem_callback"
> > 
> > The question is how this could have come to pass. It can only
> > happen
> > if tio->ti had been set before the map was reloaded. 
> > My theory is that the IO had been dispatched before the queue had
> > been
> > quiesced, like this:
> > 
> > Task A                                 Task B
> > (dispatching IO)                       (executing a DM_SUSPEND
> > ioctl to
> >                                        resume after DM_TABLE_LOAD)
> >                                        do_resume()
> >                                          dm_suspend()
> >                                            __dm_suspend()
> > dm_mq_queue_rq()                         
> >    struct dm_target *ti =                
> >        md->immutable_target;                  
> >                                               dm_stop_queue()
> >                                                 
> > blk_mq_quiesce_queue()
> >        /* 
> >         * At this point, the queue is quiesced, but task A
> >         * has alreadyentered dm_mq_queue_rq()
> >         */
> 
> That shouldn't happen, blk_mq_quiesce_queue() drains all pending
> dm_mq_queue_rq() and prevents new dm_mq_queue_rq() from being
> called.

Thanks for pointing this out. I'd been missing the fact that the
synchronization is achieved by the rcu_read_lock() in
__blk_mq_run_dispatch_ops(), which guards invocations of the
request dispatching code against the synchronize_rcu() in
blk_mq_wait_quiesce_done(). In our old kernel it was still in
hctx_lock(), but with the same effect.

This means that don't see any more how our dm_target reference could 
have pointed to freed memory. For now, we'll follow Mike's advice.

Thanks a lot,
Martin