On Thu, 2014-09-18 at 09:21 +0200, David Henningsson wrote: > > On 2014-09-18 09:08, Tanu Kaskinen wrote: > > On Tue, 2014-09-16 at 16:41 +0200, David Henningsson wrote: > >> > >> On 2014-09-16 13:22, Tanu Kaskinen wrote: > >>> On Tue, 2014-09-16 at 08:34 +0200, David Henningsson wrote: > >>>> > >>>> On 2014-09-14 21:06, Tanu Kaskinen wrote: > >>>>> I was able to trigger a crash with some unfinished code that caused > >>>>> a protocol error. The error was handled while > >>>>> pa_pstream_set_srbchannel() had not yet finished, so > >>>>> pa_native_connection.srbpending and pa_pstream.srbpending were both > >>>>> non-NULL. During the error handling, native_connection_unlink() freed > >>>>> the srbchannel, and then pa_pstream_unlink() tried to free it for the > >>>>> second time, causing an assertion error in mainloop_io_free(). > >>>> > >>>> Ok, thanks for finding! > >>>> > >>>>> --- > >>>>> src/pulsecore/protocol-native.c | 14 +++++++++++++- > >>>>> 1 file changed, 13 insertions(+), 1 deletion(-) > >>>>> > >>>>> diff --git a/src/pulsecore/protocol-native.c b/src/pulsecore/protocol-native.c > >>>>> index 6ec65d6..012c41a 100644 > >>>>> --- a/src/pulsecore/protocol-native.c > >>>>> +++ b/src/pulsecore/protocol-native.c > >>>>> @@ -2639,13 +2639,25 @@ static void setup_srbchannel(pa_native_connection *c) { > >>>>> > >>>>> static void command_enable_srbchannel(pa_pdispatch *pd, uint32_t command, uint32_t tag, pa_tagstruct *t, void *userdata) { > >>>>> pa_native_connection *c = PA_NATIVE_CONNECTION(userdata); > >>>>> + pa_srbchannel *tmp; > >>>>> > >>>>> if (tag != (uint32_t) (size_t) c->srbpending) > >>>>> protocol_error(c); > >>>>> > >>>>> pa_log_debug("Client enabled srbchannel."); > >>>>> - pa_pstream_set_srbchannel(c->pstream, c->srbpending); > >>>>> + > >>>>> + /* We must set c->sbpending to NULL before calling > >>>>> + * pa_pstream_set_srbchannel(), because otherwise there is a time window > >>>>> + * when both pa_native_connection and pa_pstream think they own the > >>>>> + * srbchannel. pa_pstream_set_srbchannel() may read and dispatch arbitrary > >>>>> + * commands from the client, and if a protocol error occurs, the connection > >>>>> + * is torn down, and if c->sbpending has not yet been set to NULL here, the > >>>>> + * srbchannel will be double-freed. XXX: Can we somehow avoid reading and > >>>>> + * dispatching commands in pa_pstream_set_srbchannel()? Processing commands > >>>>> + * recursively seems prone to bugs. */ > >>>>> + tmp = c->srbpending; > >>>>> c->srbpending = NULL; > >>>>> + pa_pstream_set_srbchannel(c->pstream, c->srbpending); > >>>> > >>>> You probably mean "tmp" here instead of "c->srbpending"... > >>> > >>> *facepalm* > >>> > >>>> Anyhow, I think it would be better to try to resolve your concern by > >>>> implementing a deferred event. Because, as you say, processing commands > >>>> recursive seems prone to bugs. I e, pa_srbchannel_set_callback would not > >>>> call srbchannel_rwloop, but instead set up a deferred event which would > >>>> just call srbchannel_rwloop. > >>>> > >>>> Does that make sense? > >>> > >>> Hard to say, because it's not clear to me why srbchannel_rwloop() is > >>> called in the first place. If it's moved to a defer event, there will be > >>> "stuff" happening between setting the callback and dispatching the defer > >>> event. It's not clear to me what that "stuff" can include. Can it > >>> include something that should not be done before srbchannel_rwloop() has > >>> been called? > >> > >> srbchannel_rwloop is called to handle this potential scenario: > >> > >> 1) peer A sends srbchannel switch command to peer B > >> 2) peer A writes something to the srbchannel > >> 3) peer B receives srbchannel command and sets up the srbchannel > >> 4) If srbchannel_rwloop is not called by peer B at this point, the > >> message written in 2) will not be picked up by peer B in a timely fashion. > > > > Why is it not picked up in a timely fashion? You have an fd that will > > normally signal POLLIN when there is data to be read. Is there some > > reason that doesn't work if there is data to be read already at > > initialization time? > > As an optimisation, the pa_fdsem layer makes sure that nothing is > written to the fd, if nothing is waiting on the other side. And the io > event waits for things to be written to the fd. Ok. I think the patch is good. It would be nice to have a comment explaining what you explained above, though. -- Tanu