On Feb 19, 2025 / 17:04, alan.adamson@xxxxxxxxxx wrote: > > On 2/19/25 11:17 AM, alan.adamson@xxxxxxxxxx wrote: > > > > On 2/18/25 11:26 PM, Shinichiro Kawasaki wrote: > > > CC+: Alan, > > > > > > On Feb 13, 2025 / 08:18, Jens Axboe wrote: > > > > The conditions for whether or not a request is allowed adding to a > > > > completion batch are a bit hard to read, and they also have a few > > > > issues. One is that ioerror may indeed be a random value on > > > > passthrough, > > > > and it's being checked unconditionally of whether or not the given > > > > request is a passthrough request or not. > > > > > > > > Rewrite the conditions to be separate for easier reading, and > > > > only check > > > > ioerror for non-passthrough requests. This fixes an issue with bio > > > > unmapping on passthrough, where it fails getting added to a batch. This > > > > both leads to suboptimal performance, and may trigger a potential > > > > schedule-under-atomic condition for polled passthrough IO. > > > > > > > > Fixes: f794f3351f26 ("block: add support for > > > > blk_mq_end_request_batch()") > > > > Signed-off-by: Jens Axboe <axboe@xxxxxxxxx> > > > I observed the blktests test case nvme/039 failure with v6.14-rc3 > > > kernel. I > > > bisected and found that this patch in the v6.14-rc3 is the trigger. > > > > > > The test run output is as follows: > > > > > > nvme/039 => nvme0n1 (test error logging) [failed] > > > runtime 5.378s ... 5.354s > > > --- tests/nvme/039.out 2024-09-20 11:20:26.405380875 +0900 > > > +++ > > > /home/shin/Blktests/blktests/results/nvme0n1/nvme/039.out.bad > > > 2025-02-19 16:13:05.061387179 +0900 > > > @@ -1,7 +1,3 @@ > > > Running nvme/039 > > > - Read(0x2) @ LBA 0, 1 blocks, Unrecovered Read Error (sct 0x2 > > > / sc 0x81) DNR > > > - Read(0x2) @ LBA 0, 1 blocks, Unknown (sct 0x3 / sc 0x75) DNR > > > - Write(0x1) @ LBA 0, 1 blocks, Write Fault (sct 0x2 / sc 0x80) DNR > > > Identify(0x6), Access Denied (sct 0x2 / sc 0x86) DNR > > > cdw10=0x1 cdw11=0x0 cdw12=0x0 cdw13=0x0 cdw14=0x0 cdw15=0x0 > > > - Read(0x2), Invalid Command Opcode (sct 0x0 / sc 0x1) DNR > > > cdw10=0x0 cdw11=0x0 cdw12=0x1 cdw13=0x0 cdw14=0x0 cdw15=0x0 > > > Test complete > > > > > > > > > The test case does error injection. Test method requires > > > reconsideration to > > > adjust to this kernel change, probably. Help for fix will be > > > appreciated. > > > > Not really an issue with the test, rather the error injector is broken. > > I'll investigate. Alan, thank you for looking into this. Then it sounds that a fix is needed in kernel side. > > The following change allows the test to pass. > > - The check of (ioerror < 0) should be (ioerror != 0) > > - passthroughs can also have ioerror set, so false needs to be returned. > This needs to be resolved. > > > diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h > index fa2a76cc2f73..b2bd2ac40441 100644 > --- a/include/linux/blk-mq.h > +++ b/include/linux/blk-mq.h > @@ -865,17 +865,17 @@ static inline bool blk_mq_add_to_batch(struct request > *req, > * 1) No batch container > * 2) Has scheduler data attached > * 3) Not a passthrough request and end_io set > - * 4) Not a passthrough request and an ioerror > + * 4) An ioerror > */ > if (!iob) > return false; > if (req->rq_flags & RQF_SCHED_TAGS) > return false; > + if (ioerror) > + return false; > if (!blk_rq_is_passthrough(req)) { > if (req->end_io) > return false; > - if (ioerror < 0) > - return false; > } I think the modification above essentially reverts the trigger commit, so it means that we can not achieve what the commit aimed at. I took a look in the call chain. Before the commit, the call chain was like this for passthrough commands: nvme_handle_cqe() blk_mq_add_to_batch() ... false is returned, then call nvme_pci_complete_rq() nvme_pci_complete_rq() nvme_complete_rq() nvme_end_req() nvme_log_err_passthrough() ... log is printed __nvme_end_req() ... the nvme command ends After the commit, it looks like this: nvme_handle_cqe() blk_mq_add_to_batch() ... true is returned. Set nvme_pci_complete_batch() nvme_pci_complete_batch() nvme_complete_batch() nvme_complete_batch() nvme_complete_batch_req() __nvme_end_req() ... log is not printed but the command ends, then nvme/039 failed Assuming the change by the trigger commit is required, I guess adding nvme_log_err_passthrough() call in nvme_complete_batch_req() will restore the log print feature for the passthrough case. As a trial, I created a patch below [*]. I confirmed it avoids the nvme/039 failure. (As Alan noted, still I also find the check for "ioerror != 0" is required for non-passthrough case, to make the test case pass.) Jens, if you have comments, please share. [*] diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index 818d4e49aab5..33030dd9b76a 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -496,6 +496,9 @@ void nvme_complete_batch_req(struct request *req) { trace_nvme_complete_rq(req); nvme_cleanup_cmd(req); + if (unlikely(nvme_req(req)->status && !(req->rq_flags & RQF_QUIET) && + blk_rq_is_passthrough(req))) + nvme_log_err_passthru(req); __nvme_end_req(req); } EXPORT_SYMBOL_GPL(nvme_complete_batch_req); diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h index fa2a76cc2f73..b891ed113306 100644 --- a/include/linux/blk-mq.h +++ b/include/linux/blk-mq.h @@ -874,7 +874,7 @@ static inline bool blk_mq_add_to_batch(struct request *req, if (!blk_rq_is_passthrough(req)) { if (req->end_io) return false; - if (ioerror < 0) + if (ioerror) return false; }