Re: Rados and user-provided buffers

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 18 Sep 2013, Rutger ter Borg wrote:
> On 2013-09-18 22:01, Sage Weil wrote:
> > 
> > The read-into-existing-buffer is only wired up properly for the C
> > interface.  For the C++ it isn't generally necessary: we allocate and read
> > the data off the network,a nd pass the reference directly back to the user
> > without making another copy.  The 2010 thread is about similarly avoiding
> > such a copy for the C API.  We didn't contemplate the situation where you
> > specifically want the bytes to go to a particular address via C++.  If
> > that's what you need, the C++ API needs to be extended, or you can just
> > use the C call for that case.
> > 
> > sage
> > 
> 
> Hey Sage,
> 
> my particular use case is a pager that uses Rados as a backend. Striping of
> pages works identical to the striping mechanism of Ceph. Reads and writes of
> multiple pages may be combined into one aio_ call with one bufferlist. Pages
> are allocated by the pager.
> 
> AFAICT, the C call provides reading into a contiguous buffer, whereas I would
> like to read into a bufferlist. What would need to be done to add support for
> this in rados?

Hmm, looking at the code, I'm surprised that this isn't working.  The C 
aio_read call is just doing

  bufferlist bl;
  bufferptr bp = buffer::create_static(len, buf);
  bl.push_back(bp);

  ret = ctx->read(oid, bl, len, off);
  if (ret >= 0) {
    if (bl.length() > len)
      return -ERANGE;
    if (bl.c_str() != buf)
      bl.copy(0, bl.length(), buf);


My guess is the rx_buffers machinery is broken and we are triggering that 
bl.copy() all the time.  In principle, was is supposed to happen:

- the outbl is passed into Objecter and associated with the request.

- in Objecter::send_op(), we do

  if (op->outbl && op->outbl->length()) {
    ldout(cct, 20) << " posting rx buffer for " << op->tid << " on " << op->session->con << dendl;
    op->con = op->session->con;
    op->con->post_rx_buffer(op->tid, *op->outbl);
  }

- in msg/Pipe.cc when we are reading a message, we find that bufferliist 
and use it directly instead of allocating a new one.

      connection_state->lock.Lock();
      map<tid_t,pair<bufferlist,int> >::iterator p = connection_state->rx_buffers.find(header.tid);
      if (p != connection_state->rx_buffers.end()) {
	if (rxbuf.length() == 0 || p->second.second != rxbuf_version) {
	  ldout(msgr->cct,10) << "reader seleting rx buffer v " << p->second.second
		   << " at offset " << offset
		   << " len " << p->second.first.length() << dendl;
...

As a first step I would 'debug objecter = 20' and 'debug ms = 20' and see 
if you see those debug messages going by for a single read request.

sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux