Re: [PATCH] block: cleanup and fix batch completion adding conditions

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Feb 19, 2025 / 17:04, alan.adamson@xxxxxxxxxx wrote:
> 
> On 2/19/25 11:17 AM, alan.adamson@xxxxxxxxxx wrote:
> > 
> > On 2/18/25 11:26 PM, Shinichiro Kawasaki wrote:
> > > CC+: Alan,
> > > 
> > > On Feb 13, 2025 / 08:18, Jens Axboe wrote:
> > > > The conditions for whether or not a request is allowed adding to a
> > > > completion batch are a bit hard to read, and they also have a few
> > > > issues. One is that ioerror may indeed be a random value on
> > > > passthrough,
> > > > and it's being checked unconditionally of whether or not the given
> > > > request is a passthrough request or not.
> > > > 
> > > > Rewrite the conditions to be separate for easier reading, and
> > > > only check
> > > > ioerror for non-passthrough requests. This fixes an issue with bio
> > > > unmapping on passthrough, where it fails getting added to a batch. This
> > > > both leads to suboptimal performance, and may trigger a potential
> > > > schedule-under-atomic condition for polled passthrough IO.
> > > > 
> > > > Fixes: f794f3351f26 ("block: add support for
> > > > blk_mq_end_request_batch()")
> > > > Signed-off-by: Jens Axboe <axboe@xxxxxxxxx>
> > > I observed the blktests test case nvme/039 failure with v6.14-rc3
> > > kernel. I
> > > bisected and found that this patch in the v6.14-rc3 is the trigger.
> > > 
> > > The test run output is as follows:
> > > 
> > > nvme/039 => nvme0n1 (test error logging) [failed]
> > >      runtime  5.378s  ...  5.354s
> > >      --- tests/nvme/039.out      2024-09-20 11:20:26.405380875 +0900
> > >      +++
> > > /home/shin/Blktests/blktests/results/nvme0n1/nvme/039.out.bad
> > > 2025-02-19 16:13:05.061387179 +0900
> > >      @@ -1,7 +1,3 @@
> > >       Running nvme/039
> > >      - Read(0x2) @ LBA 0, 1 blocks, Unrecovered Read Error (sct 0x2
> > > / sc 0x81) DNR
> > >      - Read(0x2) @ LBA 0, 1 blocks, Unknown (sct 0x3 / sc 0x75) DNR
> > >      - Write(0x1) @ LBA 0, 1 blocks, Write Fault (sct 0x2 / sc 0x80) DNR
> > >        Identify(0x6), Access Denied (sct 0x2 / sc 0x86) DNR
> > > cdw10=0x1 cdw11=0x0 cdw12=0x0 cdw13=0x0 cdw14=0x0 cdw15=0x0
> > >      - Read(0x2), Invalid Command Opcode (sct 0x0 / sc 0x1) DNR
> > > cdw10=0x0 cdw11=0x0 cdw12=0x1 cdw13=0x0 cdw14=0x0 cdw15=0x0
> > >       Test complete
> > > 
> > > 
> > > The test case does error injection. Test method requires
> > > reconsideration to
> > > adjust to this kernel change, probably. Help for fix will be
> > > appreciated.
> > 
> > Not really an issue with the test, rather the error injector is broken.
> > I'll investigate.

Alan, thank you for looking into this. Then it sounds that a fix is needed in
kernel side.

> 
> The following change allows the test to pass.
> 
> - The check of (ioerror < 0) should be (ioerror != 0)
> 
> - passthroughs can also have ioerror set, so false needs to be returned.
> This needs to be resolved.
> 
> 
> diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
> index fa2a76cc2f73..b2bd2ac40441 100644
> --- a/include/linux/blk-mq.h
> +++ b/include/linux/blk-mq.h
> @@ -865,17 +865,17 @@ static inline bool blk_mq_add_to_batch(struct request
> *req,
>          * 1) No batch container
>          * 2) Has scheduler data attached
>          * 3) Not a passthrough request and end_io set
> -        * 4) Not a passthrough request and an ioerror
> +        * 4) An ioerror
>          */
>         if (!iob)
>                 return false;
>         if (req->rq_flags & RQF_SCHED_TAGS)
>                 return false;
> +       if (ioerror)
> +               return false;
>         if (!blk_rq_is_passthrough(req)) {
>                 if (req->end_io)
>                         return false;
> -               if (ioerror < 0)
> -                       return false;
>         }

I think the modification above essentially reverts the trigger commit, so it
means that we can not achieve what the commit aimed at.

I took a look in the call chain. Before the commit, the call chain was
like this for passthrough commands:

nvme_handle_cqe()
 blk_mq_add_to_batch() ... false is returned, then call nvme_pci_complete_rq()
 nvme_pci_complete_rq()
  nvme_complete_rq()
   nvme_end_req()
    nvme_log_err_passthrough() ... log is printed
    __nvme_end_req()           ... the nvme command ends

After the commit, it looks like this:

nvme_handle_cqe()
 blk_mq_add_to_batch() ... true is returned. Set nvme_pci_complete_batch()
nvme_pci_complete_batch()
 nvme_complete_batch()
  nvme_complete_batch()
   nvme_complete_batch_req()
    __nvme_end_req()         ... log is not printed but the command ends,
                                 then nvme/039 failed

Assuming the change by the trigger commit is required, I guess adding
nvme_log_err_passthrough() call in nvme_complete_batch_req() will restore
the log print feature for the passthrough case.

As a trial, I created a patch below [*]. I confirmed it avoids the nvme/039
failure. (As Alan noted, still I also find the check for "ioerror != 0" is
required for non-passthrough case, to make the test case pass.)


Jens, if you have comments, please share.


[*]

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 818d4e49aab5..33030dd9b76a 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -496,6 +496,9 @@ void nvme_complete_batch_req(struct request *req)
 {
 	trace_nvme_complete_rq(req);
 	nvme_cleanup_cmd(req);
+	if (unlikely(nvme_req(req)->status && !(req->rq_flags & RQF_QUIET) &&
+		     blk_rq_is_passthrough(req)))
+		nvme_log_err_passthru(req);
 	__nvme_end_req(req);
 }
 EXPORT_SYMBOL_GPL(nvme_complete_batch_req);
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index fa2a76cc2f73..b891ed113306 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -874,7 +874,7 @@ static inline bool blk_mq_add_to_batch(struct request *req,
 	if (!blk_rq_is_passthrough(req)) {
 		if (req->end_io)
 			return false;
-		if (ioerror < 0)
+		if (ioerror)
 			return false;
 	}
 




[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux