Re: [LSF/MM ATTEND] block: multipage bvecs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 02/26/2016 06:33 PM, Ming Lei wrote:
> Hi,
> 
> I'd like to participate in LSF/MM and discuss multipage bvecs.
> 
> Kent posted the idea[1] before, but never pushed out.
> I have studied multipage bvecs for a while, and think
> it is a good idea to improve block subsystem.
> 
> Multipage bvecs means that one 'struct bio_bvec' can hold
> multiple pages which are physically contiguous instead
> of one single page used in current kernel.
> 

Hi Ming Lei

This is an interesting talk for me.

I don't know if you ever tried it but I did. If I take a regular
SSD disk or a PCIE flash card that I have in my machine and
I stick a pointer to a page and bv_len = PAGE_SIZE * 8 and call
submit_bio, I get 8 pages worth of IO with a single bvec and it
all just works.

Yes Yes I know it would break bunch of other places, probably
the single bvec case works better. But just to say that current
code is not that picky in assuming a single page size.

I would like to see an audit and test cases done in this regard
but to keep current API and make this transparent. I think
that all the below places you mentioned can be made transparent
to "big bvec" if coded carefully, and there need not be a separate
API for multi-page / single-page bvecs. It should all just work.
I might be wrong, have not looked at this deeply, but is my gut
feeling, that it can be possible.

Thanks for bringing up the issue
Boaz

> IMO, there are several advantages by supporting multipage bvecs:
> 
> - currently one bio from bio_alloc() can only hold at most 256
> vectors, which means one bio can be used to transfer at most
> 1Mbytes(256*4K). With multipage bvecs fs can submit bigger
> chunk via single bio because big physically contiguous segment
> is very common.
> 
> - CPU consumed in iterating bvec table should be decreased
> 
> - block merge gets simplified a lot, and segment can be merged
> just inside bio_add_page(), then singlepage bvec needn't to store
> in bvec table, finally the segment can be splitted to driver with
> proper size. blk_bio_map_sg() gets simplified too. Recent days,
> block merge becomes a bit complicated and we saw quite bug reports/fixes
> in block merge.
> 
> I'd like to hear opinions from fs guys about multipage bvecs based bio
> because this should bring up some change to the bio interface(one bio
> will represent bigger I/O than before).
> 
> Also I hope to discuss with guys in fs, dm, md, bcache... about
> the implementation because this feature will bring changes on
> these subsystems. So far, I have the following ideas:
> 
> 1) change on bio_for_each_segment()
> 
> bvec returned from this iterator helper need to keep as singlepage
> vector as before, so most users of bio iterator don't need change
> 
> 2) change on bio_for_each_segment_all()
> 
> bio_for_each_segment_all() has to be changed because callers may
> change the bvec and assume it is always singlepage now.
> 
> I think bio_for_each_segment_all() need to be splitted into
> bio_for_each_segment_all_rd() and bio_for_each_segment_all_wt().
> 
> Both two new helpers returns pointer to bio_bvec like before.
> 
> *_rd() is used to iterate each vector for reading the pointed bvec,
> and caller can not write to this vector. This helper can still
> return singlepage bvec like before, so one extra local/temp 'bio_bvec'
> variable has to be added for conversion from multipage bvec to
> singlepage bvec.
> 
> *_wt() is used to iterate each vector for changing the bvec, and
> only allowed for iterating bio with singlepage bvecs, there are
> just several such cases, such as bio bounce, bio_alloc_pages(),
> raid1 and raid10.
> 
> 3) change bvecs of cloned bio
> Such as bio bounce and raid1, one bio is cloned from the incoming
> bio, and each bvec of the cloned bio may be updated. We have to
> introduce singlepage version of bio_clone() to make the cloned bio
> only include singlepage bvec, then the bvecs can be updated like
> before.
> 
> One problem is that the cloned bio may not hold all singlepage bvec
> converted from multipage bvecs in the source bio, and one simple
> solution is to split the source bio and make sure its size can't be
> bigger than 1Mbytes(256 single page vectors).
> 
> 4) introduce bio_for_each_mp_segment()
> 
> bvec returned from this iterator helper will become multipage bvec
> which should be the actual/real segment, so drivers may switch to
> this helper if they can handle multipage segment directly, which
> should be common case.
> 
> 
> [1] http://marc.info/?l=linux-kernel&m=141680246629547&w=2
> 
> Thanks,
> Ming Lei
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux