On Tue, 2014-09-16 at 16:41 +0200, David Henningsson wrote: > > On 2014-09-16 13:22, Tanu Kaskinen wrote: > > On Tue, 2014-09-16 at 08:34 +0200, David Henningsson wrote: > >> > >> On 2014-09-14 21:06, Tanu Kaskinen wrote: > >>> I was able to trigger a crash with some unfinished code that caused > >>> a protocol error. The error was handled while > >>> pa_pstream_set_srbchannel() had not yet finished, so > >>> pa_native_connection.srbpending and pa_pstream.srbpending were both > >>> non-NULL. During the error handling, native_connection_unlink() freed > >>> the srbchannel, and then pa_pstream_unlink() tried to free it for the > >>> second time, causing an assertion error in mainloop_io_free(). > >> > >> Ok, thanks for finding! > >> > >>> --- > >>> src/pulsecore/protocol-native.c | 14 +++++++++++++- > >>> 1 file changed, 13 insertions(+), 1 deletion(-) > >>> > >>> diff --git a/src/pulsecore/protocol-native.c b/src/pulsecore/protocol-native.c > >>> index 6ec65d6..012c41a 100644 > >>> --- a/src/pulsecore/protocol-native.c > >>> +++ b/src/pulsecore/protocol-native.c > >>> @@ -2639,13 +2639,25 @@ static void setup_srbchannel(pa_native_connection *c) { > >>> > >>> static void command_enable_srbchannel(pa_pdispatch *pd, uint32_t command, uint32_t tag, pa_tagstruct *t, void *userdata) { > >>> pa_native_connection *c = PA_NATIVE_CONNECTION(userdata); > >>> + pa_srbchannel *tmp; > >>> > >>> if (tag != (uint32_t) (size_t) c->srbpending) > >>> protocol_error(c); > >>> > >>> pa_log_debug("Client enabled srbchannel."); > >>> - pa_pstream_set_srbchannel(c->pstream, c->srbpending); > >>> + > >>> + /* We must set c->sbpending to NULL before calling > >>> + * pa_pstream_set_srbchannel(), because otherwise there is a time window > >>> + * when both pa_native_connection and pa_pstream think they own the > >>> + * srbchannel. pa_pstream_set_srbchannel() may read and dispatch arbitrary > >>> + * commands from the client, and if a protocol error occurs, the connection > >>> + * is torn down, and if c->sbpending has not yet been set to NULL here, the > >>> + * srbchannel will be double-freed. XXX: Can we somehow avoid reading and > >>> + * dispatching commands in pa_pstream_set_srbchannel()? Processing commands > >>> + * recursively seems prone to bugs. */ > >>> + tmp = c->srbpending; > >>> c->srbpending = NULL; > >>> + pa_pstream_set_srbchannel(c->pstream, c->srbpending); > >> > >> You probably mean "tmp" here instead of "c->srbpending"... > > > > *facepalm* > > > >> Anyhow, I think it would be better to try to resolve your concern by > >> implementing a deferred event. Because, as you say, processing commands > >> recursive seems prone to bugs. I e, pa_srbchannel_set_callback would not > >> call srbchannel_rwloop, but instead set up a deferred event which would > >> just call srbchannel_rwloop. > >> > >> Does that make sense? > > > > Hard to say, because it's not clear to me why srbchannel_rwloop() is > > called in the first place. If it's moved to a defer event, there will be > > "stuff" happening between setting the callback and dispatching the defer > > event. It's not clear to me what that "stuff" can include. Can it > > include something that should not be done before srbchannel_rwloop() has > > been called? > > srbchannel_rwloop is called to handle this potential scenario: > > 1) peer A sends srbchannel switch command to peer B > 2) peer A writes something to the srbchannel > 3) peer B receives srbchannel command and sets up the srbchannel > 4) If srbchannel_rwloop is not called by peer B at this point, the > message written in 2) will not be picked up by peer B in a timely fashion. Why is it not picked up in a timely fashion? You have an fd that will normally signal POLLIN when there is data to be read. Is there some reason that doesn't work if there is data to be read already at initialization time? -- Tanu