On 08/12/2016 09:27 AM, Sage Weil wrote:
A ton of time is the encoding/marshalling is spent doing bufferlist appends. This is partly because the buffer code is doing lots of sanity range checks, and party because there are multiple layers that get range checks and length updates (bufferlist _len changes, and bufferlist::append_buffer (a ptr) gets it's length updated, at the very least). To simplify and speed this up, I propose an 'appender' concept/type that is used for doing appends in a more efficient way. It would be used like so: bufferlist bl; { bufferlist::safe_appender a = bl.get_safe_appender(); ::encode(foo, a); } or { bufferlist::unsafe_appender a = bl.get_unsafe_appender(1024); ::encode(foo, a); } The appender keeps its own bufferptr that it copies data into. The bufferptr isn't given to the bufferlist until the appender is destroyed (or flush() is called explicitly). This means that appends are generally just a memcpy and a position pointer addition. In the safe_appender case, we also do a range change and allocate a new buffer if necessary. In the unsafe_appender case, it is the callers responsibility to say how big a buffer is preallocated. I have a simple prototype here: https://github.com/ceph/ceph/pull/10700 It appears to be almost 10x faster when encoding a uint64_t in a loop!
Yay! This is huge. For posterity, here's where the original behavior was really hurting us in bluestore:
https://drive.google.com/file/d/0B2gTBZrkrnpZeC04eklmM2I4Wkk/view
[ RUN ] BufferList.appender_bench appending 1073741824 bytes buffer::list::append 20.285963 buffer::list encode 19.719120 buffer::list::safe_appender::append 2.588926 buffer::list::safe_appender::append_v 2.837026 buffer::list::safe_appender encode 3.000614 buffer::list::unsafe_appender::append 2.452116 buffer::list::unsafe_appender::append_v 2.553745 buffer::list::unsafe_appender encode 2.200110 [ OK ] BufferList.appender_bench (55637 ms) Interesting, unsafe isn't much faster than safe. I suspect the CPU's branch prediction is just working really well there? Anyway, thoughts on this? Any suggestions for further improvement?
I'm surprised too that unsafe isn't much faster than safe. Still, this is enough of an improvement that I think we should just run with it for now and get some performance tests done as we convert stuff over.
I think the next step is to figure out how to change our WRITE_CLASS_ENCODER macros and encode function work with both bufferlists and appenders so that it's easy to convert stuff over (and still work with a mix of bufferlist-based encoders and appender-based encoders). sage -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html
-- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html