Re: [PATCH] nvme: fix handling single range discard request

Hannes Reinecke <hare@xxxxxxx> · Sat, 4 Mar 2023 12:14:30 +0100

On 3/4/23 11:22, Ming Lei wrote:
On Sat, Mar 04, 2023 at 09:00:28AM +0100, Hannes Reinecke wrote:
On 3/4/23 00:13, Ming Lei wrote:
When investigating one customer report on warning in nvme_setup_discard,
we observed the controller(nvme/tcp) actually exposes
queue_max_discard_segments(req->q) == 1.

Obviously the current code can't handle this situation, since contiguity
merge like normal RW request is taken.

Fix the issue by building range from request sector/nr_sectors directly.

Fixes: b35ba01ea697 ("nvme: support ranged discard requests")
Signed-off-by: Ming Lei <ming.lei@xxxxxxxxxx>
---
   drivers/nvme/host/core.c | 28 +++++++++++++++++++---------
   1 file changed, 19 insertions(+), 9 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index c2730b116dc6..d4be525f8100 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -781,16 +781,26 @@ static blk_status_t nvme_setup_discard(struct nvme_ns *ns, struct request *req,
   		range = page_address(ns->ctrl->discard_page);
   	}
-	__rq_for_each_bio(bio, req) {
-		u64 slba = nvme_sect_to_lba(ns, bio->bi_iter.bi_sector);
-		u32 nlb = bio->bi_iter.bi_size >> ns->lba_shift;
-
-		if (n < segments) {
-			range[n].cattr = cpu_to_le32(0);
-			range[n].nlb = cpu_to_le32(nlb);
-			range[n].slba = cpu_to_le64(slba);
+	if (queue_max_discard_segments(req->q) == 1) {
+		u64 slba = nvme_sect_to_lba(ns, blk_rq_pos(req));
+		u32 nlb = blk_rq_sectors(req) >> (ns->lba_shift - 9);
+
+		range[0].cattr = cpu_to_le32(0);
+		range[0].nlb = cpu_to_le32(nlb);
+		range[0].slba = cpu_to_le64(slba);
+		n = 1;
+	} else { > +		__rq_for_each_bio(bio, req) {
+			u64 slba = nvme_sect_to_lba(ns, bio->bi_iter.bi_sector);
+			u32 nlb = bio->bi_iter.bi_size >> ns->lba_shift;
+
+			if (n < segments) {
+				range[n].cattr = cpu_to_le32(0);
+				range[n].nlb = cpu_to_le32(nlb);
+				range[n].slba = cpu_to_le64(slba);
+			}
+			n++;
   		}
-		n++;
   	}
   	if (WARN_ON_ONCE(n != segments)) {
Now _that_ is odd.
Looks like 'req' is not formatted according to the 'max_discard_sectors'
setting.
But if that's the case, then this 'fix' would fail whenever
'max_discard_sectors' < 'max_hw_sectors', right?

No, it isn't the case.

Shouldn't we rather modify the merge algorithm to check for
max_discard_sectors for DISCARD requests, such that we never _have_
mis-matched requests and this patch would be pointless?

But it is related with discard merge.

If queue_max_discard_segments() is 1, block layer merges discard
request/bios just like normal RW IO.

However, if queue_max_discard_segments() is > 1, block layer simply
'merges' all bios into one request, no matter if the LBA is adjacent
or not, and treat each bio as one discard segment, that is called
multi range discard too.

But wouldn't the number of bios be subject to 
'queue_max_discard_segment', too?
What guarantees we're not overflowing that for multi-segment discard merge?

Cheers,

Hannes
--
Dr. Hannes Reinecke                Kernel Storage Architect
hare@xxxxxxx                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman