I think that seems like a good way to go! -Sam On Tue, Jul 12, 2016 at 6:50 PM, Sage Weil <sweil@xxxxxxxxxx> wrote: > On Tue, 12 Jul 2016, Allen Samuels wrote: >> Good analysis. >> >> My original comments about putting the oNode on a diet included the idea >> of a "custom" encode/decode path for certain high-usage cases. At the >> time, Sage resisted going down that path hoping that a more optimized >> generic case would get the job done. Your analysis shows that while >> we've achieved significant space reduction this has come at the expense >> of CPU time -- which dominates small object performance (I suspect that >> eventually we'd discover that the variable length decode path would be >> responsible for a substantial read performance degradation also -- which >> may or may not be part of the read performance drop-off that you're >> seeing). This isn't a surprising result, though it is unfortunate. >> >> I believe we need to revisit the idea of custom encode/decode paths for >> high-usage cases, only now the gains need to be focused on CPU >> utilization as well as space efficiency. > > I still think we can get most or all of the way there in a generic way by > revising the way that we interact with bufferlist for encode and decode. > We haven't actually tried to optimize this yet, and the current code is > pretty horribly inefficient (asserts all over the place, and many layers > of pointer indirection to do a simple append). I think we need to do two > things: > > 1) decode path: optimize the iterator class so that it has a const char > *current and const char *current_end that point into the current > buffer::ptr. This way any decode will have a single pointer > add+comparison to ensure there is enough data to copy before falling into > the slow path (partial buffer, move to next buffer, etc.). > > 2) Having that comparison is still not ideal, but we shoudl consider ways > to get around that too. For example, if we know that we are going to > decode N M-byte things, we could do an iterator 'reserve' or 'check' that > ensures we have a valid pointer for that much and then proceed without > checks. The interface here would be tricky, though, since in the slow > case we'll span buffers and need to magically fall back to a different > decode path (hard to maintain) or do a temporary copy (probably faster but > we need to ensure the iterator owns it and frees is later). I'd say this > is step 2 and optional; step 1 will have the most benefit. > > 3) encode path: currently all encode methods take a bufferlist& and the > bufferlist itself as an append buffer. I think this is flawed and > limiting. Instead, we should make a new class called > buffer::list::appender (or similar) and templatize the encode methods so > they can take a safe_appender (which does bounds checking) or an > unsafe_appender (which does not). For the latter, the user takes > responsibility for making sure there is enough space by doing a reserve() > type call which returns an unsafe_appender, and it's their job to make > sure they don't shove too much data into it. That should make the encode > path a memcpy + ptr increment (for savvy/optimized callers). > > I suggest we use bluestore as a test case to make the interfaces work and > be fast. If we succeed we can take advantage of it across the reset of > the code base as well. > > That's my thinking, at least. I haven't had time to prototype it out yet, > but I think our goal should be to make the encode/decode paths capable of > being a memcpy + ptr addition in the fast path, and let that guide the > interface... > > sage > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html