Design constraints for per-client mempools

darwish.07@xxxxxxxxx (Ahmed S. Darwish) · Tue, 1 Mar 2016 12:25:23 +0200

Hi :-)

On Tue, Feb 23, 2016 at 11:45:41AM +0200, Tanu Kaskinen wrote:
> On Tue, 2016-02-23 at 11:19 +0200, Tanu Kaskinen wrote:
> > On Tue, 2016-02-23 at 07:55 +0200, Ahmed S. Danish wrote:
> > > Hello everyone,
> > > 
> > > As you probably know by now, the second patch in the memfd series
> > > was created to transform the global, shared by all clients,
> > > srbchannel mempool to a "per-client" one. [1]
> > > 
> > > Reason of this transition is to avoid data leaks between clients.
> > > In a follow-up patch series, it's even hoped that the global core
> > > mempool will also be transformed to the per-client model. [2]
> > > 
> > > 
> > > ==> Current problem:
> > > --------------------
> > > 
> > > By injecting faults at the system-call level, it was discovered
> > > that current design of per-client mempools leads to a reproducible
> > > memory fault on certain error-recovery paths.
> > > 
> > > The fault is caused by a design issue, rather than a small coding
> > > error.
> > > 
> > > 
> > > ==> Why are you dedicating a full mail to this?
> > > -----------------------------------------------
> > > 
> > > The problematic issue discovered implies that a bigger design
> > > change is required; your input is very much needed :-)
> > > 
> > > 
> > > ==> What was the current design?
> > > --------------------------------
> > > 
> > > To do the per-client mempool transformation, a simple decision
> > > was taken: let 'pa_native_connection' own the per-client mempool
> > > in question.
> > > 
> > > Why? Because when a client connects, a 'native connection' object
> > > is created for it by default. Inside this object is the pstream
> > > pipe and other per client resources. So this seemed like the best
> > > place for a per-client mempool:
> > > 
> > > Â  /* protocol-native.c - Called for each client connection */
> > > Â  void pa_native_protocol_connect(...) {
> > > Â Â Â Â Â Â pa_native_connection *c;
> > > 
> > > Â Â Â Â Â Â ...
> > > Â Â Â Â Â Â c = pa_msgobject_new(pa_native_connection);
> > > Â Â Â Â Â Â c->pstream = pa_pstream_new(...);
> > > Â Â Â Â Â Â c->mempool = pa_mempool_new(...);Â Â Â Â // per-client mempool
> > > Â Â Â Â Â Â ...
> > > Â  }
> > > 
> > > And 'naturally', the pool should be deallocated when the
> > > connection closes:
> > > 
> > > Â  /* protocol-native.c */
> > > Â  native_connection_free(pa_object *o) {
> > > Â Â Â Â Â Â ...Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â 
> > > Â Â Â Â Â Â pa_pdispatch_unref(c->pdispatch);
> > > Â Â Â Â Â Â pa_pstream_unref(c->pstream);
> > > 
> > > Â Â Â Â Â Â // Srbchannels allocates from this per-client pool,
> > > Â Â Â Â Â Â // so free it only after the pstream is freed
> > > Â Â Â Â Â Â pa_mempool_free(c->mempool);
> > > Â Â Â Â Â Â ...
> > > Â  }
> > > 
> > > All looked fine and dandy ..
> > > 
> > > 
> > > ==> And where is the problem exactly?
> > > -------------------------------------
> > > 
> > > By injecting failures in sendmsg(2), thus reaching up to the
> > > iochannel and pstream layers, the following leak is shown:
> > > 
> > > Â  E: Memory pool destroyed but not all mem blocks freed! 1 remain
> > > 
> > > and a segfault is then produced due to a use-after-free operation
> > > on the leaked memblock.
> > > 
> > > The sequence of failure, bottom-up, is as follows:
> > > 
> > > Â  -> sendmsg(2) failes -EACCES
> > > Â Â Â Â -> pa_iochannel_write_with_creds() return failure
> > > Â Â Â Â Â Â -> pstream do_write() fails
> > > Â Â Â Â Â Â Â Â -> do_pstream_read_write() fails, jumps to error recovery
> > > 
> > > And the error recovery goes as this:
> > > 
> > > Â  -> pstream die callback (p->callback) is called
> > > Â  -> That's protocol-native.c pstream_die_callback()
> > > Â Â Â Â -> native_connection_unlinkÂ Â Â Â // Stops srbchannel, etc.
> > > Â Â Â Â Â Â -> pa_pstream_unlinkÂ Â Â Â Â Â Â Â Â // Reset all pstream callbacks
> > > Â Â Â Â Â Â -> native_connection_freeÂ Â Â Â // Per-client mempool _freed!_
> > > Â  -> pa_pstream_unlinkÂ Â Â Â Â Â Â Â Â Â Â Â Â // Second time, does nothing..
> > > Â  -> pa_pstram_free
> > > Â Â Â Â -> pa_queue_free(p->send_queue, item_free)
> > > Â Â Â Â Â Â -> 1. PACKET item found .. safely removed
> > > Â Â Â Â Â Â -> 2. MEMBLOCK item found .. from freed srbchannel mempool
> > > Â Â Â Â Â Â Â Â Â Â Â Â BOOM!!! segfault
> > > 
> > > SUMMARY : As seen above, a per-client mempool's lifetime must be
> > > a superset of the pstream's lifetime. Putting the per-client pool
> > > in the native_connection object provided only an _illusion_ of
> > > such a superset. [*]
> > > 
> > > During error recovery, stale memblocks remain in pstrem's send
> > > queue long after __all the per-client objects__ has been removed.
> > > 
> > > [*] Check native_connection_free() under "What was the current
> > > Â Â Â Â design?" for why this illusion was maintained.
> > > 
> > > 
> > > ==> Suggested solutions?
> > > ------------------------
> > > 
> > > The situation a is bit tricky.
> > > 
> > > Where can we put the per-client pool while insuring a correct
> > > lifecycle â?? especially during error recovery?
> > > 
> > > The safest option, it seems, is to let the per-client pool be
> > > owned by the pstream object, thus the pool can be freed at the
> > > very end of pstream_free() itself. Is such a solution acceptable?
> > 
> > My first reaction is that why is the pstream object reference counted?
> >

Seems this was just a regular convention rather than a conscious
design decision. This is evidenced by the fact of having only
__two__ pa_pstream_ref() calls in the entire tree. At pstream.c
do_pstream_read_write() and in the same file at srb_callback().
In both places they're just a ref/unref couple done at local
context.

Then there's the original ref() done implicitly when a client
connects:

  void pa_native_protocol_connect(pa_native_protocol *p, ..) {
      pa_native_connection *c;
      ..
      c = pa_msgobject_new(pa_native_connection);
      c->pstream = pa_pstream_new(..);
      ..
  }

And the unref when a client disconnects or connection fails:

  static void native_connection_free(pa_object *o) {
      pa_native_connection *c = PA_NATIVE_CONNECTION(o);
      ..
      pa_pstream_unref(c->pstream);
      ..   
      pa_xfree(c);
  }

So from the above, it seems there's no obvious reason for making
pstreams reference counted .. just a convention.

> > Why can't pa_native_connection be fully in charge of when to free the
> > pstream object? Freeing pstream in native_connection_free should fix
> > the crash.
> >

Indeed. it seems if we make pstream owned by native_connection,
then we have a guarantee that the pstream, and all memblocks it
references in its send_queue, are freed before freeing the rest
of per-client objects - including the per-client mempool..

> > I don't have big complaints about making pstream the owner of the
> > mempool either, though.
> >

Yes, although now I prefer your above proposal more ;-)

native_connection owns all per-client objects now, if we add
pstream (a per-client object too) to the mix, the layering is
maintained..

pstream owning a mempool feels like a layering violation: pstreams
are solely responsible for the lower-levely pipe stuff.

> > However, preventing this particular crash may not be sufficient.

It seems so :-(

> > Is it
> > somehow guaranteed that the client mempool blocks are referenced only
> > by client-specific objects? When a client sends memblocks to the
> > server, where do those memblocks end up? They go to the sink input's
> > render_memblockq, from which I think they usually get copied to the
> > sink buffer, not directly referenced by the sink.

Hmm...

When a client sends playback audio memblocks to the server, it
sends these blocks from its own mempool. A mempool which is
created very early on - before even connecting to the server.

When the server receives these SHM block references, it
attaches them (without any data copy, as PA_MEMBLOCK_IMPORTED) to
the core global mempool. The core global mempool is also created
very early on - few steps after main() and parsing daemon options.

So AFAIK, right now, this scenario does not involve the per-client
mempools available. But... if we transfrom the global mempool to
a per-client model in a follow-up patch series, that scenario can
indeed be very, very material.

> > However, it seems
> > that source outputs that monitor that specific sink input do get a
> > reference to the memblock, and if the sink input client dies, memblocks
> > may remain in the monitoring source output. Or maybe not... I now
> > noticed that pa_sink_input_unlink() kills all connected direct
> > monitoring source outputs.
> >

Yup. As stated in your follow-up email "at least module combine
sink puts the rendered memblock in a buffer to wait for further
processing. So, the client mempool can't always be instantly
freed" ..

So, indeed, when transforming the global mempool to a per-client
model in the future, that would be a serious memory corruption..

> > So, maybe with the current code base it is sufficient to only fix the
> > crash you found, but the design doesn't give very strong guarantees
> > that the per-client memblocks don't wander off to some buffer somewhere
> > that isn't controlled by the client.

Yes, it seems so

> > Would it be a good idea (in some
> > later patch set) to make mempools reference counted so that they are
> > freed only once there are no references to any blocks in the mempool?

Until yesterday, that was the plan: let the per-client mempool
work be in its own follow-up patch series. But then it was
discovered that if the srbchannel mempool is left global as is
-- created very early before any client connects -- then we
can't negotiate its SHM type (posix or memfd). This would make
the whole topic of the memfd patch series quite moot..

So now making the mempools reference counted to correctly
support per-client ownership and SHM type negotiation, at
least for the srbchannel pool, is a necessisity before
submitting memfd v3..

Thanks!

-- 
Darwish
http://darwish.chasingpointers.com