Re: [PATCH v1 00/54] block: support multipage bvec

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Guys,

On Tue, Dec 27, 2016 at 11:55 PM, Ming Lei <tom.leiming@xxxxxxxxx> wrote:
> Hi,
>
> This patchset brings multipage bvec into block layer. Basic
> xfstests(-a auto) over virtio-blk/virtio-scsi have been run
> and no regression is found, so it should be good enough
> to show the approach now, and any comments are welcome!
>
> 1) what is multipage bvec?
>
> Multipage bvecs means that one 'struct bio_bvec' can hold
> multiple pages which are physically contiguous instead
> of one single page used in linux kernel for long time.
>
> 2) why is multipage bvec introduced?
>
> Kent proposed the idea[1] first.
>
> As system's RAM becomes much bigger than before, and
> at the same time huge page, transparent huge page and
> memory compaction are widely used, it is a bit easy now
> to see physically contiguous pages from fs in I/O.
> On the other hand, from block layer's view, it isn't
> necessary to store intermediate pages into bvec, and
> it is enough to just store the physicallly contiguous
> 'segment'.
>
> Also huge pages are being brought to filesystem[2], we
> can do IO a hugepage a time[3], requires that one bio can
> transfer at least one huge page one time. Turns out it isn't
> flexiable to change BIO_MAX_PAGES simply[3]. Multipage bvec
> can fit in this case very well.
>
> With multipage bvec:
>
> - bio size can be increased and it should improve some
> high-bandwidth IO case in theory[4].
>
> - Inside block layer, both bio splitting and sg map can
> become more efficient than before by just traversing the
> physically contiguous 'segment' instead of each page.
>
> - there is possibility in future to improve memory footprint
> of bvecs usage.
>
> 3) how is multipage bvec implemented in this patchset?
>
> The 1st 9 patches comment on some special cases. As we saw,
> most of cases are found as safe for multipage bvec,
> only fs/buffer, MD and btrfs need to deal with. Both fs/buffer
> and btrfs are dealt with in the following patches based on some
> new block APIs for multipage bvec.
>
> Given a little more work is involved to cleanup MD, this patchset
> introduces QUEUE_FLAG_NO_MP for them, and this component can still
> see/use singlepage bvec. In the future, once the cleanup is done, the
> flag can be killed.
>
> The 2nd part(23 ~ 54) implements multipage bvec in block:
>
> - put all tricks into bvec/bio/rq iterators, and as far as
> drivers and fs use these standard iterators, they are happy
> with multipage bvec
>
> - bio_for_each_segment_all() changes
> this helper pass pointer of each bvec directly to user, and
> it has to be changed. Two new helpers(bio_for_each_segment_all_sp()
> and bio_for_each_segment_all_mp()) are introduced.
>
> Also convert current bio_for_each_segment_all() into the
> above two.
>
> - bio_clone() changes
> At default bio_clone still clones one new bio in multipage bvec
> way. Also single page version of bio_clone() is introduced
> for some special cases, such as only single page bvec is used
> for the new cloned bio(bio bounce, ...)
>
> - btrfs cleanup
> just three patches for avoiding direct access to bvec table.
>
> These patches can be found in the following git tree:
>
>         https://github.com/ming1/linux/commits/mp-bvec-0.6-v4.10-rc
>
> Thanks Christoph for looking at the early version and providing
> very good suggestions, such as: introduce bio_init_with_vec_table(),
> remove another unnecessary helpers for cleanup and so on.
>
> TODO:
>         - cleanup direct access to bvec table for MD
>
> V1:
>         - against v4.10-rc1 and some cleanup in V0 are in -linus already
>         - handle queue_virt_boundary() in mp bvec change and make NVMe happy
>         - further BTRFS cleanup
>         - remove QUEUE_FLAG_SPLIT_MP
>         - rename for two new helpers of bio_for_each_segment_all()
>         - fix bounce convertion
>         - address comments in V0

Any comments on this version?

BTW, with one fix in the following link:

https://github.com/ming1/linux/commit/e52897a21b4b4c1500cc3686b8392757ebc5bd19

xfstests(ext4, xfs and btrfs) were run and no regression is observed.

Also one new patch is introduced to cover dio over block device:

https://github.com/ming1/linux/commit/58a0f7a7f6afa74cc29d453f9b5d79304c90aa09

Thanks,
Ming

>
> [1], http://marc.info/?l=linux-kernel&m=141680246629547&w=2
> [2], https://patchwork.kernel.org/patch/9451523/
> [3], http://marc.info/?t=147735447100001&r=1&w=2
> [4], http://marc.info/?l=linux-mm&m=147745525801433&w=2
>
>
> Ming Lei (54):
>   block: drbd: comment on direct access bvec table
>   block: loop: comment on direct access to bvec table
>   kernel/power/swap.c: comment on direct access to bvec table
>   mm: page_io.c: comment on direct access to bvec table
>   fs/buffer: comment on direct access to bvec table
>   f2fs: f2fs_read_end_io: comment on direct access to bvec table
>   bcache: comment on direct access to bvec table
>   block: comment on bio_alloc_pages()
>   block: comment on bio_iov_iter_get_pages()
>   block: introduce flag QUEUE_FLAG_NO_MP
>   md: set NO_MP for request queue of md
>   dm: limit the max bio size as BIO_MAX_PAGES * PAGE_SIZE
>   block: comments on bio_for_each_segment[_all]
>   block: introduce multipage/single page bvec helpers
>   block: implement sp version of bvec iterator helpers
>   block: introduce bio_for_each_segment_mp()
>   block: introduce bio_clone_sp()
>   bvec_iter: introduce BVEC_ITER_ALL_INIT
>   block: bounce: avoid direct access to bvec table
>   block: bounce: don't access bio->bi_io_vec in copy_to_high_bio_irq
>   block: introduce bio_can_convert_to_sp()
>   block: bounce: convert multipage bvecs into singlepage
>   bcache: handle bio_clone() & bvec updating for multipage bvecs
>   blk-merge: compute bio->bi_seg_front_size efficiently
>   block: blk-merge: try to make front segments in full size
>   block: blk-merge: remove unnecessary check
>   block: use bio_for_each_segment_mp() to compute segments count
>   block: use bio_for_each_segment_mp() to map sg
>   block: introduce bvec_for_each_sp_bvec()
>   block: bio: introduce single/multi page version of
>     bio_for_each_segment_all()
>   block: introduce bio_segments_all()
>   block: introduce bvec_get_last_sp()
>   block: deal with dirtying pages for multipage bvec
>   block: convert to singe/multi page version of
>     bio_for_each_segment_all()
>   bcache: convert to bio_for_each_segment_all_sp()
>   dm-crypt: don't clear bvec->bv_page in crypt_free_buffer_pages()
>   dm-crypt: convert to bio_for_each_segment_all_sp()
>   md/raid1.c: convert to bio_for_each_segment_all_sp()
>   fs/mpage: convert to bio_for_each_segment_all_sp()
>   fs/direct-io: convert to bio_for_each_segment_all_sp()
>   ext4: convert to bio_for_each_segment_all_sp()
>   xfs: convert to bio_for_each_segment_all_sp()
>   gfs2: convert to bio_for_each_segment_all_sp()
>   f2fs: convert to bio_for_each_segment_all_sp()
>   exofs: convert to bio_for_each_segment_all_sp()
>   fs: crypto: convert to bio_for_each_segment_all_sp()
>   fs/btrfs: convert to bio_for_each_segment_all_sp()
>   fs/block_dev.c: convert to bio_for_each_segment_all_sp()
>   fs/iomap.c: convert to bio_for_each_segment_all_sp()
>   fs/buffer.c: use bvec iterator to truncate the bio
>   btrfs: avoid access to .bi_vcnt directly
>   btrfs: use bvec_get_last_sp to get the last singlepage bvec
>   btrfs: comment on direct access bvec table
>   block: enable multipage bvecs
>
>  block/bio.c                      | 110 +++++++++++++++----
>  block/blk-merge.c                | 227 +++++++++++++++++++++++++++++++--------
>  block/blk-zoned.c                |   5 +-
>  block/bounce.c                   |  75 +++++++++----
>  drivers/block/drbd/drbd_bitmap.c |   1 +
>  drivers/block/loop.c             |   5 +
>  drivers/md/bcache/btree.c        |   4 +-
>  drivers/md/bcache/debug.c        |  30 +++++-
>  drivers/md/bcache/super.c        |   6 ++
>  drivers/md/bcache/util.c         |   7 ++
>  drivers/md/dm-crypt.c            |   4 +-
>  drivers/md/dm.c                  |  11 +-
>  drivers/md/md.c                  |  12 +++
>  drivers/md/raid1.c               |   3 +-
>  fs/block_dev.c                   |   6 +-
>  fs/btrfs/check-integrity.c       |  12 ++-
>  fs/btrfs/compression.c           |  12 ++-
>  fs/btrfs/disk-io.c               |   3 +-
>  fs/btrfs/extent_io.c             |  26 +++--
>  fs/btrfs/extent_io.h             |   1 +
>  fs/btrfs/file-item.c             |   6 +-
>  fs/btrfs/inode.c                 |  34 ++++--
>  fs/btrfs/raid56.c                |   6 +-
>  fs/buffer.c                      |  24 +++--
>  fs/crypto/crypto.c               |   3 +-
>  fs/direct-io.c                   |   4 +-
>  fs/exofs/ore.c                   |   3 +-
>  fs/exofs/ore_raid.c              |   3 +-
>  fs/ext4/page-io.c                |   3 +-
>  fs/ext4/readpage.c               |   3 +-
>  fs/f2fs/data.c                   |  13 ++-
>  fs/gfs2/lops.c                   |   3 +-
>  fs/gfs2/meta_io.c                |   3 +-
>  fs/iomap.c                       |   3 +-
>  fs/mpage.c                       |   3 +-
>  fs/xfs/xfs_aops.c                |   3 +-
>  include/linux/bio.h              | 164 ++++++++++++++++++++++++++--
>  include/linux/blk_types.h        |   6 ++
>  include/linux/blkdev.h           |   2 +
>  include/linux/bvec.h             | 138 ++++++++++++++++++++++--
>  kernel/power/swap.c              |   2 +
>  mm/page_io.c                     |   2 +
>  42 files changed, 829 insertions(+), 162 deletions(-)
>
> --
> 2.7.4
>



-- 
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux