Re: bufferlist allocation optimization ideas

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



We explored a number of these ideas.  We have a few branches that might be picked over.

Having said that, our feeling was that the generality to span shared and non-shared cases transparently has cost in the unmarked case.  Other aspects of the buffer indirection are essential (e.g., Accelio originated buffers, etc).  We see a large contribution from ptr::release in perf.  One of the main aspirations we had was to identify code paths which would never share buffers, and not pay for sharing in those paths.

To the degree that bufferlist is frequently used as a kind of flexible string class, while other code uses it as a smart tailq of iovec or struct uio, there is client code with disjoint assumptions.  As mentioned, shared vs. non-shared code paths are similarly disjoint.  I'm not certain what the consequent here is.  Ceph code gets a lot of simplification from this idiom, but it is not minimalist.

We found ways, as Piotr suggested, to avoid allocations of groups of objects related to a message, and this had a lot of impact.  We're trying to merge some of that soon.

Matt

----- Original Message -----
From: "Sage Weil" <sweil@xxxxxxxxxx>
To: "Piotr Dałek" <Piotr.Dalek@xxxxxxxxxxxxxx>
Cc: ceph-devel@xxxxxxxxxxxxxxx
Sent: Monday, August 10, 2015 3:39:56 PM
Subject: RE: bufferlist allocation optimization ideas

On Mon, 10 Aug 2015, Da?ek, Piotr wrote:
> This is pretty much low-level approach, what I was actually wondering is 
> whether we can reduce amount of memory (de)allocations on higher level, 
> like improving the message lifecycle logic (from receiving to performing 
> actual operation and finishing it), so it wouldn't involve so many 
> allocations and deallocations. Reducing memory allocation on low level 
> will help, no doubts about this, but we can probably improve on higher 
> level and don't risk breaking more than we need.

Yes, definitely!  I think we should pursue both...

sage


> 
> 
> With best regards / Pozdrawiam
> Piotr Da?ek
> 
> 
> > -----Original Message-----
> > From: ceph-devel-owner@xxxxxxxxxxxxxxx [mailto:ceph-devel-
> > owner@xxxxxxxxxxxxxxx] On Behalf Of Sage Weil
> > Sent: Monday, August 10, 2015 9:20 PM
> > To: ceph-devel@xxxxxxxxxxxxxxx
> > Subject: bufferlist allocation optimization ideas
> > 
> > Currently putting something in a bufferlist invovles 3 allocations:
> > 
> >  1. raw buffer (posix_memalign, or new char[])    2. buffer::rawÂ(this holds
> > the refcount.  lifecycle matches the
> >     raw buffer exactly)
> >  Â  3. bufferlist's STL list<> node, which embeds buffer::ptr
> > 
> > --- combine buffer and buffer::raw ---
> > 
> > This should be a pretty simple patch, and turns 2 allocations into one.  Most
> > buffers are constructed/allocated via buffer::create_*() methods.  Those
> > each look something like
> > 
> >   buffer::raw* buffer::create(unsigned len) {
> >     return new raw_char(len);
> >   }
> > 
> > where raw_char::raw_char() allocates the actual buffer.  Instead, allocate
> > sizeof(raw_char_combined) + len, and use the right magic C++ syntax to call
> > the constructor on that memory.  Something like
> > 
> >   raw_char_combined *foo = new (ptr) raw_char_combined(ptr);
> > 
> > where the raw_char_combined constructor is smart enough to figure out
> > that data goes at ptr + sizeof(*this).
> > 
> > That takes us from 3 -> 2 allocations.
> > 
> > An open question is whether this is always a good idea, or whether there are
> > cases where 2 allocates are better, e.g. when len is exactly one page, and
> > we're better off with a mempool allocation for raw and page separately.  Or
> > maybe for very large buffers?  I'm really not sure what would be better...
> > 
> > 
> > --- make bufferlist use boost::intrusive::list ---
> > 
> > Most buffers exist in only one list, so the indirection through the ptr is mostly
> > wasted.
> > 
> > 1. embed a boost::intrustive::list node into buffer::ptr.  (Note that doing just
> > this buys us nothing... we are just allocating ptr's and using the intrusive node
> > instead the list<> node with an embedded ptr.)
> > 
> > 2. embed a ptr in buffer::raw (or raw_char_combined)
> > 
> > When adding a buffer to the bufferlist, we use the raw_char_combined's
> > embedded ptr if it is available.  Otherwise, we allocate one as before.
> > 
> > This would need some careful adjustment of hte common append() paths,
> > since they currently are all ptr-based.  One way to make this work well might
> > be to embed N ptr's in raw_char_combined, on the assumption that the
> > refcount for a buffer is never more than 2 or 3.  Only in extreme cases will we
> > need to explicitly allocate ptr's.
> > 
> > 
> > Thoughts?
> > sage
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux