Re: move more work to disk_release v2

Christoph Hellwig <hch@xxxxxx> · Thu, 3 Mar 2022 20:23:38 +0100

On Thu, Mar 03, 2022 at 10:19:34AM -0800, Bart Van Assche wrote:
> On 3/3/22 02:54, Christoph Hellwig wrote:
>> Maybe you can try to figure out what derefernce causes
>> the null-ptr-deref, and what kind of command causes this?  Also
>> I suspect this is the first patch in the series, so it would be
>> great to verify the problem with just that.
>
> Hi Christoph,
>
> I can reproduce the crash by cherry-picking patch "blk-mq: do not include 
> passthrough requests in I/O accounting" on top of Jens' for-next branch.
>
> From the struct request that triggers the crash (the flag names have been 
> looked up manually and hence may be wrong):
> * cmd_flags 0x44202 = REQ_PREFLUSH | REQ_NOMERGE | REQ_FAILFAST_TRANSPORT |
>   REQ_OP_FLUSH.
> * rq_flags 0x2000 = RQF_IO_STAT.

So this is a flush request.  Flush request from the flush state machine.
Normally they don't go through the I/O accounting because the I/O
accounting happens before we call into the flush state machine.  But
with blk-mq we can run the flush state machine on the upper dm-mpath
device and then hand a request with a NULL bio down.

I can't really explain why you hit that path and I don't withthe same
test.

Can you try this patch on top of the series?

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 6a072543bde4d..73b8bc9d67cf6 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -883,7 +883,10 @@ static inline void blk_account_io_done(struct request *req, u64 now)
 
 static void __blk_account_io_start(struct request *rq)
 {
-	rq->part = rq->bio->bi_bdev;
+	if (rq->bio)
+		rq->part = rq->bio->bi_bdev;
+	else /* should only happen for dm-mpath flush requests */
+		rq->part = rq->q->disk->part0;
 
 	part_stat_lock();
 	update_io_ticks(rq->part, jiffies, false);