On Mon, 30 Sep 2013, Josh Durgin wrote: > On 09/30/2013 08:38 AM, Sage Weil wrote: > > On Mon, 30 Sep 2013, Rutger ter Borg wrote: > > > > > > Dear all, > > > > > > please find attached a patch that enables a user to pass user-owned > > > buffers > > > into librados' aio_read. The patch (against dumpling) removes the buf and > > > pbl > > > data members in AioCompletionImpl. > > > > > > * The 'buf' argument to read() used to be passed into AioCompletionImpl, > > > and > > > the results would be copied back after reading. This is replaced with the > > > creation of a static buffer of that buf. > > > > > > * The pbl argument in AioCompletionImpl is removed. > > > > > > The patch is tested against an application using librados. I've assumed > > > that > > > 'pbl' in > > > > > > aio_read( ...., pbl, ) > > > > > > is allocated by the user. It may even speed things up: a buffer copy is > > > prevented. > > > > I am a little worried that one path of aio_read uses c->bl and the other > > doesn't, but that probably is no big deal provided it is noted in the > > structure definition. > > It does clean up the existing usage, where the destination may be > c->buf or c->pbl though. > > > My larger concern is that we're about to do some major changes in the > > messenger and other code to use splice/tee/vmsplice to avoid copies > > to/from userspace when possible. That will involve removing some of the > > currently 'use the existing buffer' code. I'm hoping it will work out > > that in the librados case we just carry the kernel pages around a bit > > longer and delay the final copy into userspace, but it's hard to say until > > the code gets written. Josh plans to start working on it this week. > > > > Josh, do you think we should apply this now or wait until we see where > > things end up? > > I'm fine applying this now (with one fix). It's a nice cleanup > even if things change more soon. > > For the C interface, the return value stored in the AioCompletionImpl > needs to be the length read, so the caller can tell if a short read > occurred (this is only possible when trying to read past the end of an > object). This was being set in C_aio_Ack::finish(), but was removed by > this patch. > > One thing I'm not sure about is whether the bufferlist is guaranteed > not to be split anywhere in the lower levels. rados_read() > accounts for this case: > > if (bl.c_str() != buf) > bl.copy(0, bl.length(), buf); > > Sage, is that actually necessary? I think it's worthwhile as a safety check. Even if is not currently necessary now the implementation could change such that it is later. sage -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html