Re: [LSF/MM ATTEND] block: multipage bvecs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Feb 27, 2016 at 12:33 AM, Ming Lei <tom.leiming@xxxxxxxxx> wrote:
> Hi,
>
> I'd like to participate in LSF/MM and discuss multipage bvecs.
>
> Kent posted the idea[1] before, but never pushed out.
> I have studied multipage bvecs for a while, and think
> it is a good idea to improve block subsystem.
>
> Multipage bvecs means that one 'struct bio_bvec' can hold
> multiple pages which are physically contiguous instead
> of one single page used in current kernel.
>
> IMO, there are several advantages by supporting multipage bvecs:
>
> - currently one bio from bio_alloc() can only hold at most 256
> vectors, which means one bio can be used to transfer at most
> 1Mbytes(256*4K). With multipage bvecs fs can submit bigger
> chunk via single bio because big physically contiguous segment
> is very common.
>
> - CPU consumed in iterating bvec table should be decreased
>
> - block merge gets simplified a lot, and segment can be merged
> just inside bio_add_page(), then singlepage bvec needn't to store
> in bvec table, finally the segment can be splitted to driver with
> proper size. blk_bio_map_sg() gets simplified too. Recent days,
> block merge becomes a bit complicated and we saw quite bug reports/fixes
> in block merge.
>
> I'd like to hear opinions from fs guys about multipage bvecs based bio
> because this should bring up some change to the bio interface(one bio
> will represent bigger I/O than before).
>
> Also I hope to discuss with guys in fs, dm, md, bcache... about
> the implementation because this feature will bring changes on
> these subsystems. So far, I have the following ideas:
>
> 1) change on bio_for_each_segment()
>
> bvec returned from this iterator helper need to keep as singlepage
> vector as before, so most users of bio iterator don't need change
>
> 2) change on bio_for_each_segment_all()
>
> bio_for_each_segment_all() has to be changed because callers may
> change the bvec and assume it is always singlepage now.
>
> I think bio_for_each_segment_all() need to be splitted into
> bio_for_each_segment_all_rd() and bio_for_each_segment_all_wt().
>
> Both two new helpers returns pointer to bio_bvec like before.
>
> *_rd() is used to iterate each vector for reading the pointed bvec,
> and caller can not write to this vector. This helper can still
> return singlepage bvec like before, so one extra local/temp 'bio_bvec'
> variable has to be added for conversion from multipage bvec to
> singlepage bvec.
>
> *_wt() is used to iterate each vector for changing the bvec, and
> only allowed for iterating bio with singlepage bvecs, there are
> just several such cases, such as bio bounce, bio_alloc_pages(),
> raid1 and raid10.
>
> 3) change bvecs of cloned bio
> Such as bio bounce and raid1, one bio is cloned from the incoming
> bio, and each bvec of the cloned bio may be updated. We have to
> introduce singlepage version of bio_clone() to make the cloned bio
> only include singlepage bvec, then the bvecs can be updated like
> before.
>
> One problem is that the cloned bio may not hold all singlepage bvec
> converted from multipage bvecs in the source bio, and one simple
> solution is to split the source bio and make sure its size can't be
> bigger than 1Mbytes(256 single page vectors).
>
> 4) introduce bio_for_each_mp_segment()
>
> bvec returned from this iterator helper will become multipage bvec
> which should be the actual/real segment, so drivers may switch to
> this helper if they can handle multipage segment directly, which
> should be common case.

5) remove most of direct access to bio->bi_io_vec & bio->bi_vcnt
in other subsystems

Most of this usage are from btrfs, raid1, raid10 and bcache,  the direct
access to .bi_io_vec and .bi_vcnt may cause mess once multipage
bvecs is introduced except for the following cases:

- direct access before calling bio_add_page()
- adding single page vector
- adding page directly instead of using bio_add_page()
- REQ_PC bio case

Thanks,
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux