On Thu, 2015-12-03 at 16:27 +0000, Frediano Ziglio wrote: > From: Marc-André Lureau <marcandre.lureau@xxxxxxxxx> > > This fixes the following scary racy corruption after glib loop switch: > Note that this valgrind trace is not valid for the current tree. Probably commits got re-ordered and code got shifted around since this patch was made. It's still probably useful to have the trace in the log, but we should at least note that it's out-of-date. Perhaps we should also test whether this is still reproducible with the current code. > ==28173== > ==28173== Debugger has detached. Valgrind regains control. We > continue. > ==28173== Invalid read of size 8 > ==28173== at 0x4C7871E: reds_stream_read (reds.c:4521) > ==28173== by 0x4C2F9D7: red_peer_receive (red_channel.c:209) > ==28173== by 0x4C2FB59: red_peer_handle_incoming (red_channel.c:255) > ==28173== by 0x4C2FF36: > red_channel_client_receive (red_channel.c:329) > ==28173== by 0x4C33D6D: red_channel_client_event (red_channel.c:1577) > ==28173== by 0x4C65098: watch_func (red_worker.c:10292) > ==28173== by 0x504DDAB: g_io_unix_dispatch (giounix.c:167) > ==28173== by 0x4FFACB6: g_main_dispatch (gmain.c:3065) > ==28173== by 0x4FFBA0D: g_main_context_dispatch (gmain.c:3641) > ==28173== by 0x4FFBBFF: g_main_context_iterate (gmain.c:3712) > ==28173== by 0x4FFC028: g_main_loop_run (gmain.c:3906) > ==28173== by 0x4C6ABF2: red_worker_main (red_worker.c:12180) > ==28173== Address 0x7d688e0 is 32 bytes inside a block of size 160 > free'd > ==28173== at 0x4A074C4: free (in > /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) > ==28173== by 0x4C78A84: reds_stream_free (reds.c:4594) > ==28173== by 0x4C34E2E: > red_channel_client_disconnect (red_channel.c:1865) > ==28173== by 0x4C30363: > red_channel_client_default_peer_on_error (red_channel.c:417) > ==28173== by 0x4C3011B: red_peer_handle_outgoing (red_channel.c:372) > ==28173== by 0x4C3305F: red_channel_client_send (red_channel.c:1298) > ==28173== by 0x4C33FB6: > red_channel_client_begin_send_message (red_channel.c:1616) > ==28173== by 0x4C5F4B4: > display_begin_send_message (red_worker.c:8360) > ==28173== by 0x4C61DE8: display_channel_send_item (red_worker.c:9164) > ==28173== by 0x4C30C69: > red_channel_client_send_item (red_channel.c:599) > ==28173== by 0x4C332BB: red_channel_client_push (red_channel.c:1351) > ==28173== by 0x4C33C3A: > red_channel_client_handle_message (red_channel.c:1545) > --- > server/red-channel.c | 39 +++++++++++++++++++++------------------ > server/red-worker.c | 2 ++ > 2 files changed, 23 insertions(+), 18 deletions(-) > > diff --git a/server/red-channel.c b/server/red-channel.c > index 2a64bc8..ae16b9c 100644 > --- a/server/red-channel.c > +++ b/server/red-channel.c > @@ -1238,22 +1238,25 @@ static void red_channel_client_ref(RedChannelClient > *rcc) > > static void red_channel_client_unref(RedChannelClient *rcc) > { > - if (!--rcc->refs) { > - spice_debug("destroy rcc=%p", rcc); > - if (rcc->send_data.main.marshaller) { > - spice_marshaller_destroy(rcc->send_data.main.marshaller); > - } > + spice_debug("rcc=%p", rcc); > > - if (rcc->send_data.urgent.marshaller) { > - spice_marshaller_destroy(rcc->send_data.urgent.marshaller); > - } > + if (--rcc->refs != 0) > + return; > > - red_channel_client_destroy_remote_caps(rcc); > - if (rcc->channel) { > - red_channel_unref(rcc->channel); > - } > - free(rcc); > - } > + reds_stream_free(rcc->stream); > + rcc->stream = NULL; > + > + if (rcc->send_data.main.marshaller) > + spice_marshaller_destroy(rcc->send_data.main.marshaller); > + > + if (rcc->send_data.urgent.marshaller) > + spice_marshaller_destroy(rcc->send_data.urgent.marshaller); > + > + red_channel_client_destroy_remote_caps(rcc); > + if (rcc->channel) > + red_channel_unref(rcc->channel); > + > + free(rcc); > } > > void red_channel_client_destroy(RedChannelClient *rcc) > @@ -1758,7 +1761,7 @@ void red_channel_pipes_add_empty_msg(RedChannel > *channel, int msg_type) > int red_channel_client_is_connected(RedChannelClient *rcc) > { > if (!rcc->dummy) { > - return rcc->stream != NULL; > + return (rcc->stream != NULL) && ring_item_is_linked(&rcc > ->channel_link); This change seems harmless, but it doesn't seem strictly related to the bug being fixed. > } else { > return rcc->dummy_connected; > } > @@ -1813,8 +1816,10 @@ static void red_channel_remove_client(RedChannelClient > *rcc) > rcc->channel->type, rcc->channel->id, > rcc->channel->thread_id, pthread_self()); > } > + spice_return_if_fail(ring_item_is_linked(&rcc->channel_link)); > + > ring_remove(&rcc->channel_link); > - spice_assert(rcc->channel->clients_num > 0); > + spice_return_if_fail(rcc->channel->clients_num > 0); Same with these. > rcc->channel->clients_num--; > // TODO: should we set rcc->channel to NULL??? > } > @@ -1854,8 +1859,6 @@ void red_channel_client_disconnect(RedChannelClient > *rcc) > rcc->channel->core->watch_remove(rcc->stream->watch); > rcc->stream->watch = NULL; > } > - reds_stream_free(rcc->stream); > - rcc->stream = NULL; > if (rcc->latency_monitor.timer) { > rcc->channel->core->timer_remove(rcc->latency_monitor.timer); > rcc->latency_monitor.timer = NULL; So, it seems that we were freeing the stream in _disconnect() before removing the watch, and we might still receive one more 'watch' event after freeing the stream. To solve the problem, the stream is not freed until later (in _unref()). This means that we're likely still getting an 'extra' client event after disconnecting the client. We won't crash anymore since the stream is still valid at this point, but it's not clear whether it's a good idea to process that last event after disconnection or not. By the way, it seems that this comment (red-channel.c:244) may be related to the same issue that this patch attempts to solve: /* XXX: This needs further investigation as to the underlying cause, it happened * after spicec disconnect (but not with spice-gtk) repeatedly. */ if (!stream) { return; } > diff --git a/server/red-worker.c b/server/red-worker.c > index 9ec90ce..ba3454e 100644 > --- a/server/red-worker.c > +++ b/server/red-worker.c > @@ -606,6 +606,8 @@ static gboolean watch_func(GIOChannel *source, > GIOCondition condition, > SpiceWatch *watch = data; > int fd = g_io_channel_unix_get_fd(source); > > + spice_return_val_if_fail(!g_source_is_destroyed(watch->source), FALSE); > + I'm curious about whether this was added out of excessive caution, or whether it was ever triggered. I guess it probably wasn't triggered since spice_return_val_if_fail() actually aborts at the moment... > watch->func(fd, giocondition_to_spice_event(condition), watch->rcc); > > return TRUE; _______________________________________________ Spice-devel mailing list Spice-devel@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/spice-devel