On Mon, Nov 04, 2019 at 07:38:42PM -0700, Jens Axboe wrote: > This is where my knee jerk at the initial "partial completions" and > "should be trivial" start to kick in. I don't think they are necessarily > hard, but they aren't free either. And you'd need to be paying that > atomic_dec cost for every IO. No need - you added the code to avoid that atomic dec for bi_remaining in the common case, the same approach will work here. > currently have to do, maybe not... If it's a clear win, then it'd be an > interesting path to pursue. But we probably won't have that answer until > at least a hacky version is done as proof of concept. > > On the upside, it'd simplify things to just have the mapping in one > place, when the request is setup. Though until all drivers do that > (which I worry will be never), then we'd be stuck with both. Maybe > that's a bit to pessimistic, should be easier now since we just have > blk-mq. blk_rq_map_sg isn't called from _that_ many places, I suspect once it's figured out for one driver the rest won't be that bad. And even if some drivers remain unconverted, I personally _much_ prefer this approach to more special case fast paths, and I bet this approach will be faster anyways. Also - regarding driver allocating of the sglists, I think most high performance drivers preallocate a pool of sglists that are all sized to what the device is capable of.