Re: read/write on RADOS using external buffer

Sage Weil <sage@xxxxxxxxxxxx> · Wed, 20 Oct 2010 08:34:25 -0700 (PDT)

On Tue, 19 Oct 2010, Colin McCabe wrote:
> On Tue, Oct 19, 2010 at 9:26 PM, Sage Weil <sage@xxxxxxxxxxxx> wrote:
> >  - SimpleMessenger::read_message() just needs to call
> > ms_deliver_get_data_buffer() if data_len > 0.  If it returns true, use the
> > provided bufferptr.
> >
> >  - The existing Objecter::read() method returns the tid, which librados
> > can then store in a private map of tid -> bufferptrs.  When it's
> > ms_get_data_buffer() is called it can provide the bufferptr; no need to
> > modify the Objecter at all!  Just make sure to verify that the entry
> > exists in the map before using the operator[] (use find()).
> 
> If we want thread-local data, we should at least use
> pthread_setspecific / pthread_getspecific. No need for tids and STL
> maps. gettid() is a syscall and it will be slow.

There's no pthread specific data or gettid() syscalls here.  The 'tid' is 
a transaction id generated by the Objecter that matchers up replies to 
requests.

> > The main corner case is this: if there is some osd cluster change, it's
> > theoretically possible we could have two threads reading read replies to
> > the same request off the wire at the same time.  Even if we carefully
> > define which reply "wins", we need to make sure there isn't a stray thread
> > read/accessing the buffer after the read() call returns.
> 
> I just don't see why we have to care about threads at all.

The corner case is when we are reading two replies for the same request 
off of two different sockets from two different OSDs.  If they are 
both reading directly into the user supplied buffer, we need to make sure 
they both stop before returning success to the user; after that point 
we're not longer allows to touch that memory.  That will probably require 
some locking and non-blocking IO, which can be tricky since the pthread 
stuff doesn't usually interact well with select/poll.

> I think I can propose a patch which solves this problem without rados
> client-specific data, threads, tids, maps, thread-local data, or
> changes to SimpleMessenger.

SimpleMessenger will change no matter what, because the core problem that 
we're trying to solve is that it allocates a new buffer when a message is 
being read off the wire.  See SimpleMessenger::read_message().  Somehow 
that needs to discover and use the user provided buffer instead.  That 
will invariably involve a callback of some sort to find the user buffer.  
And some means of yanking the buffer away to deal with the corner issue 
above.

> It will just involve modifying bufferlist
> so that it can use a buffer that's passed to it, rather than
> allocating its own.  Then we need something like a freeze operation
> that will stop it from trying to reallocate or resize the buffer. This
> solution will have much more generality and probably will be useful
> elsewhere too.

That sounds useful, but not in this particular case... :/

sage