On 02/26/2016 06:33 PM, Ming Lei wrote: > Hi, > > I'd like to participate in LSF/MM and discuss multipage bvecs. > > Kent posted the idea[1] before, but never pushed out. > I have studied multipage bvecs for a while, and think > it is a good idea to improve block subsystem. > > Multipage bvecs means that one 'struct bio_bvec' can hold > multiple pages which are physically contiguous instead > of one single page used in current kernel. > Hi Ming Lei This is an interesting talk for me. I don't know if you ever tried it but I did. If I take a regular SSD disk or a PCIE flash card that I have in my machine and I stick a pointer to a page and bv_len = PAGE_SIZE * 8 and call submit_bio, I get 8 pages worth of IO with a single bvec and it all just works. Yes Yes I know it would break bunch of other places, probably the single bvec case works better. But just to say that current code is not that picky in assuming a single page size. I would like to see an audit and test cases done in this regard but to keep current API and make this transparent. I think that all the below places you mentioned can be made transparent to "big bvec" if coded carefully, and there need not be a separate API for multi-page / single-page bvecs. It should all just work. I might be wrong, have not looked at this deeply, but is my gut feeling, that it can be possible. Thanks for bringing up the issue Boaz > IMO, there are several advantages by supporting multipage bvecs: > > - currently one bio from bio_alloc() can only hold at most 256 > vectors, which means one bio can be used to transfer at most > 1Mbytes(256*4K). With multipage bvecs fs can submit bigger > chunk via single bio because big physically contiguous segment > is very common. > > - CPU consumed in iterating bvec table should be decreased > > - block merge gets simplified a lot, and segment can be merged > just inside bio_add_page(), then singlepage bvec needn't to store > in bvec table, finally the segment can be splitted to driver with > proper size. blk_bio_map_sg() gets simplified too. Recent days, > block merge becomes a bit complicated and we saw quite bug reports/fixes > in block merge. > > I'd like to hear opinions from fs guys about multipage bvecs based bio > because this should bring up some change to the bio interface(one bio > will represent bigger I/O than before). > > Also I hope to discuss with guys in fs, dm, md, bcache... about > the implementation because this feature will bring changes on > these subsystems. So far, I have the following ideas: > > 1) change on bio_for_each_segment() > > bvec returned from this iterator helper need to keep as singlepage > vector as before, so most users of bio iterator don't need change > > 2) change on bio_for_each_segment_all() > > bio_for_each_segment_all() has to be changed because callers may > change the bvec and assume it is always singlepage now. > > I think bio_for_each_segment_all() need to be splitted into > bio_for_each_segment_all_rd() and bio_for_each_segment_all_wt(). > > Both two new helpers returns pointer to bio_bvec like before. > > *_rd() is used to iterate each vector for reading the pointed bvec, > and caller can not write to this vector. This helper can still > return singlepage bvec like before, so one extra local/temp 'bio_bvec' > variable has to be added for conversion from multipage bvec to > singlepage bvec. > > *_wt() is used to iterate each vector for changing the bvec, and > only allowed for iterating bio with singlepage bvecs, there are > just several such cases, such as bio bounce, bio_alloc_pages(), > raid1 and raid10. > > 3) change bvecs of cloned bio > Such as bio bounce and raid1, one bio is cloned from the incoming > bio, and each bvec of the cloned bio may be updated. We have to > introduce singlepage version of bio_clone() to make the cloned bio > only include singlepage bvec, then the bvecs can be updated like > before. > > One problem is that the cloned bio may not hold all singlepage bvec > converted from multipage bvecs in the source bio, and one simple > solution is to split the source bio and make sure its size can't be > bigger than 1Mbytes(256 single page vectors). > > 4) introduce bio_for_each_mp_segment() > > bvec returned from this iterator helper will become multipage bvec > which should be the actual/real segment, so drivers may switch to > this helper if they can handle multipage segment directly, which > should be common case. > > > [1] http://marc.info/?l=linux-kernel&m=141680246629547&w=2 > > Thanks, > Ming Lei > -- > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html