On Thu, Feb 28, 2019 at 05:58:32AM -0800, Christoph Hellwig wrote: > On Thu, Feb 28, 2019 at 11:24:21AM +0800, Ming Lei wrote: > > bio_for_each_bvec is used in fast path of bio splitting and sg mapping, > > and what we want to do is to iterate over multi-page bvecs, instead of pages. > > However, bvec_iter_advance() is invisble for this requirement, and > > always advance by page size. > > > > This way isn't efficient for multipage bvec iterator, also bvec_iter_len() > > isn't as fast as mp_bvec_iter_len(). > > > > So advance by multi-page bvec's length instead of page size for bio_for_each_bvec(). > > > > More than 1% IOPS improvement can be observed in io_uring test on null_blk. > > We've been there before, and I still insist that there is not good > reason ever to clamp the iteration to page size in bvec_iter_advance. > Callers that iterate over it already do that in the callers. > > So here is a resurretion and rebase of my patch from back then to > just do the right thing: > > diff --git a/include/linux/bvec.h b/include/linux/bvec.h > index 2c32e3e151a0..cf06c0647c4f 100644 > --- a/include/linux/bvec.h > +++ b/include/linux/bvec.h > @@ -112,14 +112,15 @@ static inline bool bvec_iter_advance(const struct bio_vec *bv, > } > > while (bytes) { > - unsigned iter_len = bvec_iter_len(bv, *iter); > - unsigned len = min(bytes, iter_len); > + const struct bio_vec *cur = bv + iter->bi_idx; > + unsigned len = min3(bytes, iter->bi_size, > + cur->bv_len - iter->bi_bvec_done); > > bytes -= len; > iter->bi_size -= len; > iter->bi_bvec_done += len; > > - if (iter->bi_bvec_done == __bvec_iter_bvec(bv, *iter)->bv_len) { > + if (iter->bi_bvec_done == cur->bv_len) { > iter->bi_bvec_done = 0; > iter->bi_idx++; > } Yeah, this change is the correct thing to do, and there shouldn't be performance drop with this patch for Jens' test case, I guess. Thanks, Ming