On 2016-03-01 11:25, Ahmed S. Darwish wrote: > Hi :-) > > On Tue, Feb 23, 2016 at 11:45:41AM +0200, Tanu Kaskinen wrote: >> On Tue, 2016-02-23 at 11:19 +0200, Tanu Kaskinen wrote: >>> On Tue, 2016-02-23 at 07:55 +0200, Ahmed S. Danish wrote: >>>> Hello everyone, >>>> >>>> As you probably know by now, the second patch in the memfd series >>>> was created to transform the global, shared by all clients, >>>> srbchannel mempool to a "per-client" one. [1] >>>> >>>> Reason of this transition is to avoid data leaks between clients. >>>> In a follow-up patch series, it's even hoped that the global core >>>> mempool will also be transformed to the per-client model. [2] >>>> >>>> >>>> ==> Current problem: >>>> -------------------- >>>> >>>> By injecting faults at the system-call level, it was discovered >>>> that current design of per-client mempools leads to a reproducible >>>> memory fault on certain error-recovery paths. >>>> >>>> The fault is caused by a design issue, rather than a small coding >>>> error. >>>> >>>> >>>> ==> Why are you dedicating a full mail to this? >>>> ----------------------------------------------- >>>> >>>> The problematic issue discovered implies that a bigger design >>>> change is required; your input is very much needed :-) >>>> >>>> >>>> ==> What was the current design? >>>> -------------------------------- >>>> >>>> To do the per-client mempool transformation, a simple decision >>>> was taken: let 'pa_native_connection' own the per-client mempool >>>> in question. >>>> >>>> Why? Because when a client connects, a 'native connection' object >>>> is created for it by default. Inside this object is the pstream >>>> pipe and other per client resources. So this seemed like the best >>>> place for a per-client mempool: >>>> >>>> /* protocol-native.c - Called for each client connection */ >>>> void pa_native_protocol_connect(...) { >>>> pa_native_connection *c; >>>> >>>> ... >>>> c = pa_msgobject_new(pa_native_connection); >>>> c->pstream = pa_pstream_new(...); >>>> c->mempool = pa_mempool_new(...); // per-client mempool >>>> ... >>>> } >>>> >>>> And 'naturally', the pool should be deallocated when the >>>> connection closes: >>>> >>>> /* protocol-native.c */ >>>> native_connection_free(pa_object *o) { >>>> ... >>>> pa_pdispatch_unref(c->pdispatch); >>>> pa_pstream_unref(c->pstream); >>>> >>>> // Srbchannels allocates from this per-client pool, >>>> // so free it only after the pstream is freed >>>> pa_mempool_free(c->mempool); >>>> ... >>>> } >>>> >>>> All looked fine and dandy .. >>>> >>>> >>>> ==> And where is the problem exactly? >>>> ------------------------------------- >>>> >>>> By injecting failures in sendmsg(2), thus reaching up to the >>>> iochannel and pstream layers, the following leak is shown: >>>> >>>> E: Memory pool destroyed but not all mem blocks freed! 1 remain >>>> >>>> and a segfault is then produced due to a use-after-free operation >>>> on the leaked memblock. >>>> >>>> The sequence of failure, bottom-up, is as follows: >>>> >>>> -> sendmsg(2) failes -EACCES >>>> -> pa_iochannel_write_with_creds() return failure >>>> -> pstream do_write() fails >>>> -> do_pstream_read_write() fails, jumps to error recovery >>>> >>>> And the error recovery goes as this: >>>> >>>> -> pstream die callback (p->callback) is called >>>> -> That's protocol-native.c pstream_die_callback() >>>> -> native_connection_unlink // Stops srbchannel, etc. >>>> -> pa_pstream_unlink // Reset all pstream callbacks >>>> -> native_connection_free // Per-client mempool _freed!_ >>>> -> pa_pstream_unlink // Second time, does nothing.. >>>> -> pa_pstram_free >>>> -> pa_queue_free(p->send_queue, item_free) >>>> -> 1. PACKET item found .. safely removed >>>> -> 2. MEMBLOCK item found .. from freed srbchannel mempool >>>> BOOM!!! segfault >>>> >>>> SUMMARY : As seen above, a per-client mempool's lifetime must be >>>> a superset of the pstream's lifetime. Putting the per-client pool >>>> in the native_connection object provided only an _illusion_ of >>>> such a superset. [*] >>>> >>>> During error recovery, stale memblocks remain in pstrem's send >>>> queue long after __all the per-client objects__ has been removed. >>>> >>>> [*] Check native_connection_free() under "What was the current >>>> design?" for why this illusion was maintained. >>>> >>>> >>>> ==> Suggested solutions? >>>> ------------------------ >>>> >>>> The situation a is bit tricky. >>>> >>>> Where can we put the per-client pool while insuring a correct >>>> lifecycle â?? especially during error recovery? >>>> >>>> The safest option, it seems, is to let the per-client pool be >>>> owned by the pstream object, thus the pool can be freed at the >>>> very end of pstream_free() itself. Is such a solution acceptable? >>> >>> My first reaction is that why is the pstream object reference counted? >>> > > Seems this was just a regular convention rather than a conscious > design decision. This is evidenced by the fact of having only > __two__ pa_pstream_ref() calls in the entire tree. At pstream.c > do_pstream_read_write() and in the same file at srb_callback(). > In both places they're just a ref/unref couple done at local > context. I recently added the one in srb_callback for a good reason: commit f277f2c5094fb32c5d879923960eb807b3b1c535 Author: David Henningsson <david.henningsson at canonical.com> Date: Fri Oct 16 22:12:32 2015 +0200 pstream: Fix use-after-free in srb_callback ...please make sure this bug does not reappear if you change things around :-) -- David Henningsson, Canonical Ltd. https://launchpad.net/~diwic