Re: [PATCH 1/9] block: Make generic_make_request handle arbitrary sized bios

Keith Busch <keith.busch@xxxxxxxxx> · Sat, 1 Mar 2014 10:52:15 -0700 (MST)

On Fri, 28 Feb 2014, Kent Overstreet wrote:
On Thu, Feb 27, 2014 at 12:22:54PM -0500, Matthew Wilcox wrote:
On Wed, Feb 26, 2014 at 03:39:49PM -0800, Kent Overstreet wrote:
We do this by adding calls to blk_queue_split() to the various
make_request functions that need it - a few can already handle arbitrary
size bios. Note that we add the call _after_ any call to blk_queue_bounce();
this means that blk_queue_split() and blk_recalc_rq_segments() don't need to
be concerned with bouncing affecting segment merging.

diff --git a/drivers/block/nvme-core.c b/drivers/block/nvme-core.c
index 51824d1f23..e4376b9613 100644
--- a/drivers/block/nvme-core.c
+++ b/drivers/block/nvme-core.c
@@ -737,6 +737,8 @@ static void nvme_make_request(struct request_queue *q, struct bio *bio)
 	struct nvme_queue *nvmeq = get_nvmeq(ns->dev);
 	int result = -EBUSY;

+	blk_queue_split(q, &bio, q->bio_split);
+
 	if (!nvmeq) {
 		put_nvmeq(NULL);
 		bio_endio(bio, -EIO);

I'd suggest that we do:

-	struct nvme_queue *nvmeq = get_nvmeq(ns->dev);
+	struct nvme_queue *nvmeq;
 	int result = -EBUSY;

+	blk_queue_split(q, &bio, q->bio_split);
+
+	nvmeq = get_nvmeq(ns->dev);
 	if (!nvmeq) {

so that we're running the blk_queue_split() code outside the get_cpu()
call.

Now, the NVMe driver has its own rules about when BIOs have to be split.
Right now, that's way down inside the nvme_map_bio() call when we're
walking the bio to compose the scatterlist.  Should we instead have an
nvme_bio_split() routine that is called instead of blk_queue_split(),
and we can simplify nvme_map_bio() since it'll know that it's working
with bios that don't have to be split.

In fact, I think it would have little NVMe-specific in it at that point,
so we could name __blk_bios_map_sg() better, export it to drivers and
call it from nvme_map_bio(), which I think would make everybody happier.

Actually, reading nvme_map_bio() (it's different since last I looked at
it) it looks like nvme should already be able to handle arbitrary size
bios?

I do intend to rework the blk_bio_map_sg() (or add a new one?) to
incrementally map as much of a bio as will fit in the provided
scatterlist, but it looks like nvme has some odd restrictions where it's
using BIOVEC_PHYS_MERGABLE()/BIOVEC_NOT_VIRT_MERGABLE() so I dunno if
it's worth bothering to try and have it use generic code.

Is nvme the only driver that has these kinds of restrictions on segment
address offsets? If so, I guess there's no reason to make it generic.

However we don't need an explicit split here: if the sg fills up (i.e.
the places nvme_split_and_submit() is called), we can just mark the bio
as partially completed (set bio->bi_iter = iter, i.e. use the iterator
you passed to bio_for_each_segment), then increment bi_remaining (which
just counts completions, i.e. bio_endio() calls before the bio is really
completed) and resubmit the original bio. No need to allocate a split
bio, or loop over the bio again in bio_split().

We used to manipulate the original bio to track partial completions,
but I changed that for reasons that haven't quite yet materialized. If we
move the bio's bi_iter, it will make it difficult to retry the original
request on intermittent failures, and it will break the integrity verify
if the device format supports protection information. It's also more
performant to submit all parts at once rather than wait for the previous
part to complete before sending the next.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html