Hi, I'd like to participate in LSF/MM and discuss multipage bvecs. Kent posted the idea[1] before, but never pushed out. I have studied multipage bvecs for a while, and think it is a good idea to improve block subsystem. Multipage bvecs means that one 'struct bio_bvec' can hold multiple pages which are physically contiguous instead of one single page used in current kernel. IMO, there are several advantages by supporting multipage bvecs: - currently one bio from bio_alloc() can only hold at most 256 vectors, which means one bio can be used to transfer at most 1Mbytes(256*4K). With multipage bvecs fs can submit bigger chunk via single bio because big physically contiguous segment is very common. - CPU consumed in iterating bvec table should be decreased - block merge gets simplified a lot, and segment can be merged just inside bio_add_page(), then singlepage bvec needn't to store in bvec table, finally the segment can be splitted to driver with proper size. blk_bio_map_sg() gets simplified too. Recent days, block merge becomes a bit complicated and we saw quite bug reports/fixes in block merge. I'd like to hear opinions from fs guys about multipage bvecs based bio because this should bring up some change to the bio interface(one bio will represent bigger I/O than before). Also I hope to discuss with guys in fs, dm, md, bcache... about the implementation because this feature will bring changes on these subsystems. So far, I have the following ideas: 1) change on bio_for_each_segment() bvec returned from this iterator helper need to keep as singlepage vector as before, so most users of bio iterator don't need change 2) change on bio_for_each_segment_all() bio_for_each_segment_all() has to be changed because callers may change the bvec and assume it is always singlepage now. I think bio_for_each_segment_all() need to be splitted into bio_for_each_segment_all_rd() and bio_for_each_segment_all_wt(). Both two new helpers returns pointer to bio_bvec like before. *_rd() is used to iterate each vector for reading the pointed bvec, and caller can not write to this vector. This helper can still return singlepage bvec like before, so one extra local/temp 'bio_bvec' variable has to be added for conversion from multipage bvec to singlepage bvec. *_wt() is used to iterate each vector for changing the bvec, and only allowed for iterating bio with singlepage bvecs, there are just several such cases, such as bio bounce, bio_alloc_pages(), raid1 and raid10. 3) change bvecs of cloned bio Such as bio bounce and raid1, one bio is cloned from the incoming bio, and each bvec of the cloned bio may be updated. We have to introduce singlepage version of bio_clone() to make the cloned bio only include singlepage bvec, then the bvecs can be updated like before. One problem is that the cloned bio may not hold all singlepage bvec converted from multipage bvecs in the source bio, and one simple solution is to split the source bio and make sure its size can't be bigger than 1Mbytes(256 single page vectors). 4) introduce bio_for_each_mp_segment() bvec returned from this iterator helper will become multipage bvec which should be the actual/real segment, so drivers may switch to this helper if they can handle multipage segment directly, which should be common case. [1] http://marc.info/?l=linux-kernel&m=141680246629547&w=2 Thanks, Ming Lei -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html