Re: [Lsf-pc] [LSF/MM ATTEND] block: multipage bvecs

Ming Lei <tom.leiming@xxxxxxxxx> · Mon, 29 Feb 2016 02:49:41 +0800

On Mon, Feb 29, 2016 at 1:09 AM, James Bottomley
<James.Bottomley@xxxxxxxxxxxxxxxxxxxxx> wrote:
> On Mon, 2016-02-29 at 00:59 +0800, Ming Lei wrote:
>> On Mon, Feb 29, 2016 at 12:45 AM, James Bottomley
>> <James.Bottomley@xxxxxxxxxxxxxxxxxxxxx> wrote:
>> > On Sun, 2016-02-28 at 08:29 -0800, Christoph Hellwig wrote:
>> > > On Sun, Feb 28, 2016 at 08:26:46AM -0800, James Bottomley wrote:
>> > > > You mean in bio_add_page() the code which currently aggregates
>> > > > chunks within a page could build a bio vec entry up to the max
>> > > > segment size?  I think that is reasonable, especially now the
>> > > > bio
>> > > > splitting code can actually split inside a bio vec entry.
>> > >
>> > > Yes.  Kent has an old prototype that did this at:
>> > >
>> > > https://evilpiepirate.org/git/linux-bcache.git/log/?h=block_stuff
>> > >
>> > > I don't think any of that is reusable as-is, but the basic idea
>> > > is
>> > > sounds and very useful.
>> >
>> > The basic idea, yes, but the actual code in that tree would still
>> > have
>> > built up bv entries that are too big.  We have to thread
>> > bio_add_page()
>> > with knowledge of the queue limits, which is somewhat hard since
>> > they're deliberately queue agnostic.  Perhaps some global minimum
>> > queue
>> > segment size would work?
>>
>> IMO, we can just build contiguous segment simply into one vector
>> because bio_add_page() in hot path, then compute segments during
>> bio splitting from submit_bio() path by applying all kinds of queue
>> limit just like current way.
>
> We can debate this, but I'm dubious about the effectiveness.  the

When bio_add_page() is called, fs is preparing data, and it is reasonable
to figure out segments & split just after the bio is filled up and ready(
in submit_bio() path).

If fs knows pages are physically contiguous, it may be better to introduce
bio_add_pages() and add all these pages in batch, which looks more
efficient.

> reason we have biovecs and don't use one bio per page is efficiency.
>  On large memory machines, most large IO transfers tend to be
> physically contiguous because the allocators make it so.  The splitting
> code splits into bios not biovecs, so we'll likely end up with one bio
> per segment.  Is that better than one page per large biovec?  Not sure,

Firstly, it doesn't mean multipage bvecs ends up with single vector, because
most of current users continue to call bio_add_page() if it the bio
isn't full. And fs
often has more data to transfer.

Secondly current blk_bio_segment_split() just splits bios into bio because
each bvec only includes one page. When multipage bvec is introduced,
one multipage bvec may need to be splitted out into different segments
because of queue's limit, but both these segments may be OK to be
handled in one bio.

> someone will have to do careful benchmarking.

Yes, we need to do that carfully, :-)

In theory, for each of above two cases, multipage bvecs may improve
performance, also block merge can get simplified a lot.

Thanks,
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html