Hello Martin, On Sat, Mar 16, 2024 at 12:10:35AM +0100, Martin Wilck wrote: > In a recent kernel dump analysis, we found that the kernel crashed because > dm_rq_target_io tio->ti was pointing to invalid memory in dm_end_request(), > in a situation where multipathd was doing map reloads because of a storage > failover. The map of the respective mapped_device had been replaced by a > different struct dm_table. > > We obverved this with a 5.3.18 distro kernel, but the code in question > hasn't change much since then. Basically, we were only missing > b4459b11e840 ("dm rq: don't queue request to blk-mq during DM suspend"), > which doesn't guarantee that the race I'm thinking of (see below) can't > happen. > > When a map is resumed after a table reload, the live table is swapped, and > the tio->ti member of any live request becomes stale. __dm_resume() avoids > this by quiescing the queue and calling dm_wait_for_completion(), which > waits until blk_mq_queue_inflight() doesn't report any in-flight requests. > > However, blk_mq_queue_inflight() counts only "started" requests. So, if a > request is dispatched before the queue was quiesced, but > dm_wait_for_completion() doesn't observe MQ_RQ_IN_FLIGHT for this request > because of memory ordering effects, __dm_suspend() may finish successfully, Can you explain a bit about the exact memory order which causes MQ_RQ_IN_FLIGHT not observed? blk-mq quiesce includes synchronize_rcu() which drains all in-flight dispatch, so after blk_mq_quiesce_queue() returns, if blk_mq_queue_inflight() returns 0, it does mean there isn't any active inflight requests. If there is bug in this pattern, I guess more drivers may have such 'risk'. BTW, what is the underlying disks in your dm-mpath setting? Thanks, Ming