Re: [PATCH V4] block: optimize for small block size IO

Kent Overstreet <kent.overstreet@xxxxxxxxx> · Mon, 4 Nov 2019 22:14:17 -0500

On Mon, Nov 04, 2019 at 07:38:42PM -0700, Jens Axboe wrote:
> This is where my knee jerk at the initial "partial completions" and
> "should be trivial" start to kick in. I don't think they are necessarily
> hard, but they aren't free either. And you'd need to be paying that
> atomic_dec cost for every IO.

No need - you added the code to avoid that atomic dec for bi_remaining in the
common case, the same approach will work here.

> currently have to do, maybe not... If it's a clear win, then it'd be an
> interesting path to pursue. But we probably won't have that answer until
> at least a hacky version is done as proof of concept.
> 
> On the upside, it'd simplify things to just have the mapping in one
> place, when the request is setup. Though until all drivers do that
> (which I worry will be never), then we'd be stuck with both. Maybe
> that's a bit to pessimistic, should be easier now since we just have
> blk-mq.

blk_rq_map_sg isn't called from _that_ many places, I suspect once it's figured
out for one driver the rest won't be that bad.

And even if some drivers remain unconverted, I personally _much_ prefer this
approach to more special case fast paths, and I bet this approach will be faster
anyways.

Also - regarding driver allocating of the sglists, I think most high performance
drivers preallocate a pool of sglists that are all sized to what the device is
capable of.