Re: [PATCH] fix librados aio read buffer handling

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 30 Sep 2013, Josh Durgin wrote:
> On 09/30/2013 08:38 AM, Sage Weil wrote:
> > On Mon, 30 Sep 2013, Rutger ter Borg wrote:
> > > 
> > > Dear all,
> > > 
> > > please find attached a patch that enables a user to pass user-owned
> > > buffers
> > > into librados' aio_read. The patch (against dumpling) removes the buf and
> > > pbl
> > > data members in AioCompletionImpl.
> > > 
> > > * The 'buf' argument to read() used to be passed into AioCompletionImpl,
> > > and
> > > the results would be copied back after reading. This is replaced with the
> > > creation of a static buffer of that buf.
> > > 
> > > * The pbl argument in AioCompletionImpl is removed.
> > > 
> > > The patch is tested against an application using librados. I've assumed
> > > that
> > > 'pbl' in
> > > 
> > > aio_read( ...., pbl, )
> > > 
> > > is allocated by the user. It may even speed things up: a buffer copy is
> > > prevented.
> > 
> > I am a little worried that one path of aio_read uses c->bl and the other
> > doesn't, but that probably is no big deal provided it is noted in the
> > structure definition.
> 
> It does clean up the existing usage, where the destination may be
> c->buf or c->pbl though.
> 
> > My larger concern is that we're about to do some major changes in the
> > messenger and other code to use splice/tee/vmsplice to avoid copies
> > to/from userspace when possible.  That will involve removing some of the
> > currently 'use the existing buffer' code.  I'm hoping it will work out
> > that in the librados case we just carry the kernel pages around a bit
> > longer and delay the final copy into userspace, but it's hard to say until
> > the code gets written.  Josh plans to start working on it this week.
> > 
> > Josh, do you think we should apply this now or wait until we see where
> > things end up?
> 
> I'm fine applying this now (with one fix). It's a nice cleanup
> even if things change more soon.
> 
> For the C interface, the return value stored in the AioCompletionImpl
> needs to be the length read, so the caller can tell if a short read
> occurred (this is only possible when trying to read past the end of an
> object). This was being set in C_aio_Ack::finish(), but was removed by
> this patch.
> 
> One thing I'm not sure about is whether the bufferlist is guaranteed
> not to be split anywhere in the lower levels. rados_read()
> accounts for this case:
> 
>     if (bl.c_str() != buf)
>       bl.copy(0, bl.length(), buf);
> 
> Sage, is that actually necessary?

I think it's worthwhile as a safety check.  Even if is not currently 
necessary now the implementation could change such that it is later.

sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux