[LSF/MM ATTEND] block: multipage bvecs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

I'd like to participate in LSF/MM and discuss multipage bvecs.

Kent posted the idea[1] before, but never pushed out.
I have studied multipage bvecs for a while, and think
it is a good idea to improve block subsystem.

Multipage bvecs means that one 'struct bio_bvec' can hold
multiple pages which are physically contiguous instead
of one single page used in current kernel.

IMO, there are several advantages by supporting multipage bvecs:

- currently one bio from bio_alloc() can only hold at most 256
vectors, which means one bio can be used to transfer at most
1Mbytes(256*4K). With multipage bvecs fs can submit bigger
chunk via single bio because big physically contiguous segment
is very common.

- CPU consumed in iterating bvec table should be decreased

- block merge gets simplified a lot, and segment can be merged
just inside bio_add_page(), then singlepage bvec needn't to store
in bvec table, finally the segment can be splitted to driver with
proper size. blk_bio_map_sg() gets simplified too. Recent days,
block merge becomes a bit complicated and we saw quite bug reports/fixes
in block merge.

I'd like to hear opinions from fs guys about multipage bvecs based bio
because this should bring up some change to the bio interface(one bio
will represent bigger I/O than before).

Also I hope to discuss with guys in fs, dm, md, bcache... about
the implementation because this feature will bring changes on
these subsystems. So far, I have the following ideas:

1) change on bio_for_each_segment()

bvec returned from this iterator helper need to keep as singlepage
vector as before, so most users of bio iterator don't need change

2) change on bio_for_each_segment_all()

bio_for_each_segment_all() has to be changed because callers may
change the bvec and assume it is always singlepage now.

I think bio_for_each_segment_all() need to be splitted into
bio_for_each_segment_all_rd() and bio_for_each_segment_all_wt().

Both two new helpers returns pointer to bio_bvec like before.

*_rd() is used to iterate each vector for reading the pointed bvec,
and caller can not write to this vector. This helper can still
return singlepage bvec like before, so one extra local/temp 'bio_bvec'
variable has to be added for conversion from multipage bvec to
singlepage bvec.

*_wt() is used to iterate each vector for changing the bvec, and
only allowed for iterating bio with singlepage bvecs, there are
just several such cases, such as bio bounce, bio_alloc_pages(),
raid1 and raid10.

3) change bvecs of cloned bio
Such as bio bounce and raid1, one bio is cloned from the incoming
bio, and each bvec of the cloned bio may be updated. We have to
introduce singlepage version of bio_clone() to make the cloned bio
only include singlepage bvec, then the bvecs can be updated like
before.

One problem is that the cloned bio may not hold all singlepage bvec
converted from multipage bvecs in the source bio, and one simple
solution is to split the source bio and make sure its size can't be
bigger than 1Mbytes(256 single page vectors).

4) introduce bio_for_each_mp_segment()

bvec returned from this iterator helper will become multipage bvec
which should be the actual/real segment, so drivers may switch to
this helper if they can handle multipage segment directly, which
should be common case.


[1] http://marc.info/?l=linux-kernel&m=141680246629547&w=2

Thanks,
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux