Re: [PATCH spice-server 5/5] red-client: Prevent some nasty threading problems disconnecting channels

Frediano Ziglio <fziglio@xxxxxxxxxx> · Wed, 30 Aug 2017 04:19:47 -0400 (EDT)

> 
> [spice-devel was dropped when you replied, I don't know if this was
> intentional]
> 

no, added again.

> Hey,
> 
> On Fri, Aug 25, 2017 at 12:48:51PM -0400, Frediano Ziglio wrote:
> > Don't want to fool nobody, these issues are really bound together
> > and from the small lock issue I manage to spend lot of time trying
> > to understand it
> 
> The issues are indeed related, but even after taking a closer look, I
> think at least the refcounting and the list iteration code could be
> separated...
> 

I'll try to separate.

> 
> > (OT: I started writing a document, where should
> > I put it? A markdown in docs??).
> 
> Yes, we can have new files in docs/
> One thing which could be useful is some annotations in
> RedChannelClient/RedChannel/RedClient regarding which functions are safe
> to call from any thread, and which functions should be called from the
> thread the GObject was created in. I would not be surprised if we had
> more issues like the ones you are fixing here :( Dunno if qemu is
> runnable in helgrind.
> 
> > 
> > So, let me try to recap. No lock on the list which is accessed by
> > multiple threads. Added the lock... deadlock adding. So decide
> > to scan the list in a more safe way but with less lock.
> 
> Hmm, my preference would be to require that RedClient::channels is
> always manipulated from the main thread, have you considered that
> option? (either asynchronously call
> red_client_add_channel/red_client_remove_channel from RedChannelClient,
> or blocking call which calls into the main thread).
> 

I have considered but creates different issues too.
For instance adding would be more of an issue, currently the creation
of a RedChannelClient is done in the channel thread and you add
to RedClient. If you want to split you'll have to create the
RedChannelClient but not "activate" it (registering watches and timers),
then call the main thread to add to RedClient, than RedClient with another
call to channel thread will activate the RedClient.
During removal you'll have to do a call to the main thread to remove the
RedChannelClient from RedClient.
But this will also have some potential problems. For instance during
creation you'd have a "connected" (by definition RCCs are connected
if inside RedChannel lists) RCC not bound to RedClient while now
this condition is impossible.

> 
> > Than I realized some function call was referring to objects
> > (like RedChannelClient but also RedChannel) potentially running
> > in different thread. So had to check them if the data they accesses
> > are safe to be accessed that way.
> > Than as the RedClient function is supposed to destroy objects I
> > started printing when stuff got released and looking at the paths.
> > Then some objects were not released (the RedChannelClient) so
> > goes and check the various reference counting. Then after fixing
> > some some usage after free started to happen.
> > 
> > If you look at red-channel-client.c and search for g_object_ref
> > or unref you will note lot of cases where the object you are working
> > on are incremented at some point and decremented at the end.
> > But so the object pointer we receive has no ownership? The problem
> > is that when you receive it it has ownership but during the
> > execution it can loose it. For instance you scan the list of
> > ChannelClients from Channel and call a function and in the meantime
> > the object get removed from the initial list. Note: the lists are
> > all scanned in a "safe" way, we store the next element before
> > handling the current so if the current is removed the next SHOULD
> > be safe (I think this is one reason why MainChannelClient
> > disconnection is done asynchronously in the main thread even if
> > MainChannelClient run in the main thread).
> 
> After looking at that code, ownership of RedChannelClient is indeed
> odd.. After your patch, it looks like RedClient::channels is considered
> as owning the newly created RedChannelClient?
> 

Even before actually but was less clear. This as in red_client_destroy
the RCCs were unreferenced.

> If that's your goal, I would make it very explicit, both in the commit
> log and in the code. red_channel_client_initable_init() adds the
> RedChannelClient to both RedChannel and RedClient, but when the object
> has been created, its refcount is only 1, so I'd also make it very
> explicit that the 2nd list does not own any reference to the
> RedChannelClient.
> Once this is cleared up, hopefully reviewing your
> g_object_ref/g_object_unref changes becomes very easy.
> 
> > > > --- a/server/red-client.c
> > > > +++ b/server/red-client.c
> > > > @@ -192,9 +192,6 @@ void red_client_migrate(RedClient *client)
> > > >  
> > > >  void red_client_destroy(RedClient *client)
> > > >  {
> > > > -    RedChannelClient *rcc;
> > > > -
> > > > -    spice_printerr("destroy client %p with #channels=%d", client,
> > > > g_list_length(client->channels));
> > > >      if (!pthread_equal(pthread_self(), client->thread_id)) {
> > > >          spice_warning("client->thread_id (0x%lx) != pthread_self
> > > >          (0x%lx)."
> > > >                        "If one of the threads is != io-thread && !=
> > > >                        vcpu-thread,"
> > > > @@ -202,22 +199,46 @@ void red_client_destroy(RedClient *client)
> > > >                        client->thread_id,
> > > >                        pthread_self());
> > > >      }
> > > > -    FOREACH_CHANNEL_CLIENT(client, rcc) {
> > > > +
> > > > +    pthread_mutex_lock(&client->lock);
> > > > +    spice_printerr("destroy client %p with #channels=%d", client,
> > > > g_list_length(client->channels));
> > > > +    // this make sure that possible RedChannelClient setting up
> > > > +    // to be added won't be added to the list
> > > > +    client->disconnecting = TRUE;
> > > > +    while (client->channels) {
> > > >          RedChannel *channel;
> > > > +        RedChannelClient *rcc = client->channels->data;
> > > > +
> > > > +        // Remove the list to the RedChannelClient we are processing
> > > > +        // note that we own the object so is safe to do some
> > > > operations on
> > > > it.
> > > > +        // This manual scan of the list is done to have a thread safe
> > > > +        // iteration of the list
> > > > +        client->channels = g_list_delete_link(client->channels,
> > > > client->channels);
> > > > +
> > > >          // some channels may be in other threads, so disconnection
> > > >          // is not synchronous.
> > > >          channel = red_channel_client_get_channel(rcc);
> > > >          red_channel_client_set_destroying(rcc);
> > > > +
> > > > +        // prevent dead lock disconnecting rcc (which can happen
> > > > +        // in the same thread or synchronously on another one)
> > > > +        pthread_mutex_unlock(&client->lock);
> > > > +
> > 
> > One of the reasons I had to change the reference counting is to
> > avoid a double free not knowing if we have the ownership or not.
> > 
> > > >          // some channels may be in other threads. However we currently
> > > >          // assume disconnect is synchronous (we changed the dispatcher
> > > >          // to wait for disconnection)
> > > >          // TODO: should we go back to async. For this we need to use
> > > >          // ref count for channel clients.
> > > >          red_channel_disconnect_client(channel, rcc);
> > > > +
> > > >          spice_assert(red_channel_client_pipe_is_empty(rcc));
> > > >          spice_assert(red_channel_client_no_item_being_sent(rcc));
> > > > +
> > > >          red_channel_client_destroy(rcc);
> > 
> > Actually I think that now maybe this call is not necessary...
> > but better safe than sorry.
> 
> Yes, it seems very redundant as this call is doing disconnect +
> set_destroying. This could be a nice preparatory commit, makes things
> simpler :)
> 

Do not understand the suggestion here...

> > 
> > Basically the real disconnection can happen either in
> > red_channel_disconnect_client (which potentially run in a
> > separate thread) or in red_channel_client_destroy, they
> > both call red_channel_client_disconnect.
> > 
> > > > +        g_object_unref(rcc);
> > > > +        pthread_mutex_lock(&client->lock);
> > > >      }
> > > > +    pthread_mutex_unlock(&client->lock);
> > > >      g_object_unref(client);
> > > >  }
> > > >  
> > > > @@ -254,6 +275,16 @@ gboolean red_client_add_channel(RedClient *client,
> > > > RedChannelClient *rcc, GError
> > > >      pthread_mutex_lock(&client->lock);
> > > >  
> > > >      g_object_get(channel, "channel-type", &type, "id", &id, NULL);
> > > > +    if (client->disconnecting) {
> > > > +        g_set_error(error,
> > > > +                    SPICE_SERVER_ERROR,
> > > > +                    SPICE_SERVER_ERROR_FAILED,
> > > > +                    "Client %p got disconnected while connecting
> > > > channel
> > > > type %d id %d",
> > > > +                    client, type, id);
> > > > +        result = FALSE;
> > > > +        goto cleanup;
> > > > +    }
> 
> The client->disconnecting stuff in this commit could also go in a
> separate commit. It seems like the goal is to prevent new
> RedChannelClient creation when the RedClient is being destroyed?
> 

Yes, the question suggest that the message 
"Client %p got disconnected while connecting channel type %d id %d"
is not clear. Suggestion on how to rewrite it?

This happens as the connection is asynchronous so (MT main thread,
CT channel thread):
- MT you get a new connection;
- MT a connection is sent to CT
- MT you get a disconnection of main channel
- MT red_client_destroy is called
- CT you attempt to add the RCC to RedClient

> Christophe
> 

Frediano
_______________________________________________
Spice-devel mailing list
Spice-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/spice-devel