On 2014-09-18 11:40, Tanu Kaskinen wrote: > On Thu, 2014-09-18 at 09:21 +0200, David Henningsson wrote: >> >> On 2014-09-18 09:08, Tanu Kaskinen wrote: >>> On Tue, 2014-09-16 at 16:41 +0200, David Henningsson wrote: >>>> >>>> On 2014-09-16 13:22, Tanu Kaskinen wrote: >>>>> On Tue, 2014-09-16 at 08:34 +0200, David Henningsson wrote: >>>>>> >>>>>> On 2014-09-14 21:06, Tanu Kaskinen wrote: >>>>>>> I was able to trigger a crash with some unfinished code that caused >>>>>>> a protocol error. The error was handled while >>>>>>> pa_pstream_set_srbchannel() had not yet finished, so >>>>>>> pa_native_connection.srbpending and pa_pstream.srbpending were both >>>>>>> non-NULL. During the error handling, native_connection_unlink() freed >>>>>>> the srbchannel, and then pa_pstream_unlink() tried to free it for the >>>>>>> second time, causing an assertion error in mainloop_io_free(). >>>>>> >>>>>> Ok, thanks for finding! >>>>>> >>>>>>> --- >>>>>>> src/pulsecore/protocol-native.c | 14 +++++++++++++- >>>>>>> 1 file changed, 13 insertions(+), 1 deletion(-) >>>>>>> >>>>>>> diff --git a/src/pulsecore/protocol-native.c b/src/pulsecore/protocol-native.c >>>>>>> index 6ec65d6..012c41a 100644 >>>>>>> --- a/src/pulsecore/protocol-native.c >>>>>>> +++ b/src/pulsecore/protocol-native.c >>>>>>> @@ -2639,13 +2639,25 @@ static void setup_srbchannel(pa_native_connection *c) { >>>>>>> >>>>>>> static void command_enable_srbchannel(pa_pdispatch *pd, uint32_t command, uint32_t tag, pa_tagstruct *t, void *userdata) { >>>>>>> pa_native_connection *c = PA_NATIVE_CONNECTION(userdata); >>>>>>> + pa_srbchannel *tmp; >>>>>>> >>>>>>> if (tag != (uint32_t) (size_t) c->srbpending) >>>>>>> protocol_error(c); >>>>>>> >>>>>>> pa_log_debug("Client enabled srbchannel."); >>>>>>> - pa_pstream_set_srbchannel(c->pstream, c->srbpending); >>>>>>> + >>>>>>> + /* We must set c->sbpending to NULL before calling >>>>>>> + * pa_pstream_set_srbchannel(), because otherwise there is a time window >>>>>>> + * when both pa_native_connection and pa_pstream think they own the >>>>>>> + * srbchannel. pa_pstream_set_srbchannel() may read and dispatch arbitrary >>>>>>> + * commands from the client, and if a protocol error occurs, the connection >>>>>>> + * is torn down, and if c->sbpending has not yet been set to NULL here, the >>>>>>> + * srbchannel will be double-freed. XXX: Can we somehow avoid reading and >>>>>>> + * dispatching commands in pa_pstream_set_srbchannel()? Processing commands >>>>>>> + * recursively seems prone to bugs. */ >>>>>>> + tmp = c->srbpending; >>>>>>> c->srbpending = NULL; >>>>>>> + pa_pstream_set_srbchannel(c->pstream, c->srbpending); >>>>>> >>>>>> You probably mean "tmp" here instead of "c->srbpending"... >>>>> >>>>> *facepalm* >>>>> >>>>>> Anyhow, I think it would be better to try to resolve your concern by >>>>>> implementing a deferred event. Because, as you say, processing commands >>>>>> recursive seems prone to bugs. I e, pa_srbchannel_set_callback would not >>>>>> call srbchannel_rwloop, but instead set up a deferred event which would >>>>>> just call srbchannel_rwloop. >>>>>> >>>>>> Does that make sense? >>>>> >>>>> Hard to say, because it's not clear to me why srbchannel_rwloop() is >>>>> called in the first place. If it's moved to a defer event, there will be >>>>> "stuff" happening between setting the callback and dispatching the defer >>>>> event. It's not clear to me what that "stuff" can include. Can it >>>>> include something that should not be done before srbchannel_rwloop() has >>>>> been called? >>>> >>>> srbchannel_rwloop is called to handle this potential scenario: >>>> >>>> 1) peer A sends srbchannel switch command to peer B >>>> 2) peer A writes something to the srbchannel >>>> 3) peer B receives srbchannel command and sets up the srbchannel >>>> 4) If srbchannel_rwloop is not called by peer B at this point, the >>>> message written in 2) will not be picked up by peer B in a timely fashion. >>> >>> Why is it not picked up in a timely fashion? You have an fd that will >>> normally signal POLLIN when there is data to be read. Is there some >>> reason that doesn't work if there is data to be read already at >>> initialization time? >> >> As an optimisation, the pa_fdsem layer makes sure that nothing is >> written to the fd, if nothing is waiting on the other side. And the io >> event waits for things to be written to the fd. > > Ok. I think the patch is good. It would be nice to have a comment > explaining what you explained above, though. Thanks, pushed now (after adding a comment). -- David Henningsson, Canonical Ltd. https://launchpad.net/~diwic