On Sat, Feb 27, 2016 at 12:33 AM, Ming Lei <tom.leiming@xxxxxxxxx> wrote: > Hi, > > I'd like to participate in LSF/MM and discuss multipage bvecs. > > Kent posted the idea[1] before, but never pushed out. > I have studied multipage bvecs for a while, and think > it is a good idea to improve block subsystem. > > Multipage bvecs means that one 'struct bio_bvec' can hold > multiple pages which are physically contiguous instead > of one single page used in current kernel. > > IMO, there are several advantages by supporting multipage bvecs: > > - currently one bio from bio_alloc() can only hold at most 256 > vectors, which means one bio can be used to transfer at most > 1Mbytes(256*4K). With multipage bvecs fs can submit bigger > chunk via single bio because big physically contiguous segment > is very common. > > - CPU consumed in iterating bvec table should be decreased > > - block merge gets simplified a lot, and segment can be merged > just inside bio_add_page(), then singlepage bvec needn't to store > in bvec table, finally the segment can be splitted to driver with > proper size. blk_bio_map_sg() gets simplified too. Recent days, > block merge becomes a bit complicated and we saw quite bug reports/fixes > in block merge. > > I'd like to hear opinions from fs guys about multipage bvecs based bio > because this should bring up some change to the bio interface(one bio > will represent bigger I/O than before). > > Also I hope to discuss with guys in fs, dm, md, bcache... about > the implementation because this feature will bring changes on > these subsystems. So far, I have the following ideas: > > 1) change on bio_for_each_segment() > > bvec returned from this iterator helper need to keep as singlepage > vector as before, so most users of bio iterator don't need change > > 2) change on bio_for_each_segment_all() > > bio_for_each_segment_all() has to be changed because callers may > change the bvec and assume it is always singlepage now. > > I think bio_for_each_segment_all() need to be splitted into > bio_for_each_segment_all_rd() and bio_for_each_segment_all_wt(). > > Both two new helpers returns pointer to bio_bvec like before. > > *_rd() is used to iterate each vector for reading the pointed bvec, > and caller can not write to this vector. This helper can still > return singlepage bvec like before, so one extra local/temp 'bio_bvec' > variable has to be added for conversion from multipage bvec to > singlepage bvec. > > *_wt() is used to iterate each vector for changing the bvec, and > only allowed for iterating bio with singlepage bvecs, there are > just several such cases, such as bio bounce, bio_alloc_pages(), > raid1 and raid10. > > 3) change bvecs of cloned bio > Such as bio bounce and raid1, one bio is cloned from the incoming > bio, and each bvec of the cloned bio may be updated. We have to > introduce singlepage version of bio_clone() to make the cloned bio > only include singlepage bvec, then the bvecs can be updated like > before. > > One problem is that the cloned bio may not hold all singlepage bvec > converted from multipage bvecs in the source bio, and one simple > solution is to split the source bio and make sure its size can't be > bigger than 1Mbytes(256 single page vectors). > > 4) introduce bio_for_each_mp_segment() > > bvec returned from this iterator helper will become multipage bvec > which should be the actual/real segment, so drivers may switch to > this helper if they can handle multipage segment directly, which > should be common case. 5) remove most of direct access to bio->bi_io_vec & bio->bi_vcnt in other subsystems Most of this usage are from btrfs, raid1, raid10 and bcache, the direct access to .bi_io_vec and .bi_vcnt may cause mess once multipage bvecs is introduced except for the following cases: - direct access before calling bio_add_page() - adding single page vector - adding page directly instead of using bio_add_page() - REQ_PC bio case Thanks, Ming Lei -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html