Re: [PATCH 12/13] Make generic_make_request handle arbitrarily large bios

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello, Kent.

On Thu, May 17, 2012 at 10:59:59PM -0400, koverstreet@xxxxxxxxxx wrote:
> From: Kent Overstreet <koverstreet@xxxxxxxxxx>
> 
> The way the block layer is currently written, it goes to great lengths
> to avoid having to split bios; upper layer code (such as bio_add_page())
> checks what the underlying device can handle and tries to always create
> bios that don't need to be split.
> 
> But this approach becomes unwieldy and eventually breaks down with
> stacked devices and devices with dynamic limits, and it adds a lot of
> complexity. If the block layer could split bios as needed, we could
> eliminate a lot of complexity elsewhere - particularly in stacked
> drivers. Code that creates bios can then create whatever size bios are
> convenient, and more importantly stacked drivers don't have to deal with
> both their own bio size limitations and the limitations of the
> (potentially multiple) devices underneath them.

I generally like the idea.  Device limit directly being propagated to
high level users is cumbersome.  Somebody has to be splitting anyway
and doing it at top makes things, including limit propagation through
stacked drivers, unnecessarily complicated.  Jens, what are your
thoughts?

> +static void bio_submit_split_done(struct closure *cl)
> +{
> +	struct bio_split_hook *s = container_of(cl, struct bio_split_hook, cl);
> +
> +	s->bio->bi_end_io	= s->bi_end_io;
> +	s->bio->bi_private	= s->bi_private;
> +	bio_endio(s->bio, 0);

I'd rather you didn't indent assignments.

> +	closure_debug_destroy(&s->cl);
> +	mempool_free(s, s->q->bio_split_hook);
> +}
...
> +static int bio_submit_split(struct bio *bio)

bool?

> +{
> +	struct bio_split_hook *s;
> +	struct request_queue *q = bdev_get_queue(bio->bi_bdev);
> +
> +	if (!bio_has_data(bio) ||
> +	    !q ||
> +	    !q->bio_split_hook ||
> +	    bio_sectors(bio) <= bio_max_sectors(bio))

Style issues.

> +		return 0;
> +
> +	s = mempool_alloc(q->bio_split_hook, GFP_NOIO);
> +
> +	closure_init(&s->cl, NULL);

Please use workqueue with open coded sequencer or maybe implement bio
sequencer which can be used by other stacking drivers too.

> +	s->bio		= bio;
> +	s->q		= q;
> +	s->bi_end_io	= bio->bi_end_io;
> +	s->bi_private	= bio->bi_private;
> +
> +	bio_get(bio);
> +	bio->bi_end_io	= bio_submit_split_endio;
> +	bio->bi_private	= &s->cl;

Maybe it's okay but I *hope* bi_private override weren't necessary -
it's way too subtle.  If there's no other way and this is gonna be an
integral part of block layer, just add a field to bio.

> +unsigned __bio_max_sectors(struct bio *bio, struct block_device *bdev,
> +			   sector_t sector)
> +{
> +	unsigned ret = bio_sectors(bio);
> +	struct request_queue *q = bdev_get_queue(bdev);
> +	struct bio_vec *bv, *end = bio_iovec(bio) +
> +		min_t(int, bio_segments(bio), queue_max_segments(q));
> +
> +	struct bvec_merge_data bvm = {
> +		.bi_bdev	= bdev,
> +		.bi_sector	= sector,
> +		.bi_size	= 0,
> +		.bi_rw		= bio->bi_rw,
> +	};
> +
> +	if (bio_segments(bio) > queue_max_segments(q) ||
> +	    q->merge_bvec_fn) {
> +		ret = 0;
> +
> +		for (bv = bio_iovec(bio); bv < end; bv++) {
> +			if (q->merge_bvec_fn &&
> +			    q->merge_bvec_fn(q, &bvm, bv) < (int) bv->bv_len)
> +				break;
> +
> +			ret		+= bv->bv_len >> 9;
> +			bvm.bi_size	+= bv->bv_len;
> +		}
> +
> +		if (ret >= (BIO_MAX_PAGES * PAGE_SIZE) >> 9)
> +			return (BIO_MAX_PAGES * PAGE_SIZE) >> 9;
> +	}
> +
> +	ret = min(ret, queue_max_sectors(q));
> +
> +	WARN_ON(!ret);
> +	ret = max_t(int, ret, bio_iovec(bio)->bv_len >> 9);
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(__bio_max_sectors);

Does this by any chance allow killing ->merge_bvec_fn()?  That would
be *awesome*.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Linux ARM Kernel]     [Linux Filesystem Development]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux