Hello everyone, As you probably know by now, the second patch in the memfd series was created to transform the global, shared by all clients, srbchannel mempool to a "per-client" one. [1] Reason of this transition is to avoid data leaks between clients. In a follow-up patch series, it's even hoped that the global core mempool will also be transformed to the per-client model. [2] ==> Current problem: -------------------- By injecting faults at the system-call level, it was discovered that current design of per-client mempools leads to a reproducible memory fault on certain error-recovery paths. The fault is caused by a design issue, rather than a small coding error. ==> Why are you dedicating a full mail to this? ----------------------------------------------- The problematic issue discovered implies that a bigger design change is required; your input is very much needed :-) ==> What was the current design? -------------------------------- To do the per-client mempool transformation, a simple decision was taken: let 'pa_native_connection' own the per-client mempool in question. Why? Because when a client connects, a 'native connection' object is created for it by default. Inside this object is the pstream pipe and other per client resources. So this seemed like the best place for a per-client mempool: /* protocol-native.c - Called for each client connection */ void pa_native_protocol_connect(...) { pa_native_connection *c; ... c = pa_msgobject_new(pa_native_connection); c->pstream = pa_pstream_new(...); c->mempool = pa_mempool_new(...); // per-client mempool ... } And 'naturally', the pool should be deallocated when the connection closes: /* protocol-native.c */ native_connection_free(pa_object *o) { ... pa_pdispatch_unref(c->pdispatch); pa_pstream_unref(c->pstream); // Srbchannels allocates from this per-client pool, // so free it only after the pstream is freed pa_mempool_free(c->mempool); ... } All looked fine and dandy .. ==> And where is the problem exactly? ------------------------------------- By injecting failures in sendmsg(2), thus reaching up to the iochannel and pstream layers, the following leak is shown: E: Memory pool destroyed but not all mem blocks freed! 1 remain and a segfault is then produced due to a use-after-free operation on the leaked memblock. The sequence of failure, bottom-up, is as follows: -> sendmsg(2) failes -EACCES -> pa_iochannel_write_with_creds() return failure -> pstream do_write() fails -> do_pstream_read_write() fails, jumps to error recovery And the error recovery goes as this: -> pstream die callback (p->callback) is called -> That's protocol-native.c pstream_die_callback() -> native_connection_unlink // Stops srbchannel, etc. -> pa_pstream_unlink // Reset all pstream callbacks -> native_connection_free // Per-client mempool _freed!_ -> pa_pstream_unlink // Second time, does nothing.. -> pa_pstram_free -> pa_queue_free(p->send_queue, item_free) -> 1. PACKET item found .. safely removed -> 2. MEMBLOCK item found .. from freed srbchannel mempool BOOM!!! segfault SUMMARY : As seen above, a per-client mempool's lifetime must be a superset of the pstream's lifetime. Putting the per-client pool in the native_connection object provided only an _illusion_ of such a superset. [*] During error recovery, stale memblocks remain in pstrem's send queue long after __all the per-client objects__ has been removed. [*] Check native_connection_free() under "What was the current design?" for why this illusion was maintained. ==> Suggested solutions? ------------------------ The situation a is bit tricky. Where can we put the per-client pool while insuring a correct lifecycle â?? especially during error recovery? The safest option, it seems, is to let the per-client pool be owned by the pstream object, thus the pool can be freed at the very end of pstream_free() itself. Is such a solution acceptable? Thanks! [1] http://article.gmane.org/gmane.comp.audio.pulseaudio.general/25133 [2] http://article.gmane.org/gmane.comp.audio.pulseaudio.general/25163 -- Darwish http://darwish.chasingpointers.com