Re: [PATCH] migration: Don't assert() if MIGRATE_DATA comes before attaching the agent

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hey,

On Mon, Feb 24, 2014 at 08:26:34PM +0100, Marc-André Lureau wrote:
> On Mon, Feb 24, 2014 at 6:44 PM, Christophe Fergeau <cfergeau@xxxxxxxxxx> wrote:
> > During seamless migration, after switching host, if a client was connected
> > during the migration, it will have data to send back to the new
> > qemu/spice-server instance. This is handled through MIGRATE_DATA messages.
> > SPICE char devices use such MIGRATE_DATA messages to restore their state.
> >
> > However, the MIGRATE_DATA message can arrive any time after the new qemu
> > instance has started, this can happen before or after the SPICE char
> > devices have been created. In order to handle this, if the migrate data
> > arrives early, it's stored in reds->agent_state.mig_data, and
> > attach_to_red_agent() will restore the agent state as appropriate.
> >
> > Unfortunately this does not work as expected as expected. If
> > attach_to_red_agent() is called before the MIGRATE_DATA message reaches the
> > server, all goes well, but if MIGRATE_DATA reaches the server before
> > attach_to_red_agent() gets called, then some assert() get triggered in
> > spice_char_device_state_restore():
> >
> > ((null):32507): Spice-ERROR **: char_device.c:937:spice_char_device_state_restore: assertion `dev->num_clients == 1 && dev->wait_for_migrate_data' failed
> > Thread 3 (Thread 0x7f406b543700 (LWP 32543)):
> > Thread 2 (Thread 0x7f40697ff700 (LWP 32586)):
> > Thread 1 (Thread 0x7f4079b45a40 (LWP 32507)):
> >
> > What happens is that dev->wait_for_migrate_data is unconditionally cleared when
> > completing handling of the MIGRATE_DATA message, so it's FALSE when
> 
> Isn't it going to be still FALSE after this patch?
> 
> > spice_char_device_state_restore() is called. Moreover, dev->num_clients is
> > also 0 as this is only increased by spice_char_device_start() which
> > spice_server_char_device_add_interface() calls after
> > attach_to_red_agent()/spice_char_device_state_restore() have been called.
> 
> I don't see how this could have changed either.
> 
> > This commit changes the logic in spice_server_char_device_add_interface(),
> > and when there is migrate data pending in reds->agent_state.mig_data, we
> > only attempt to restore it after successfully initializing/starting the
> > needed char device.
> 
> Sorry, I must be tired reading the server code, but I don't see how
> this could change the race you described above. I am surely missing
> something :)

I'll have to improve the commit log as I had to reproduce the bug and poke
at the code some more in order to answer your comments :(
See below for answers to your comments above


> 
> >
> > This fixes https://bugzilla.redhat.com/show_bug.cgi?id=1035184
> > ---
> >  server/reds.c | 26 +++++++++++++++-----------
> >  1 file changed, 15 insertions(+), 11 deletions(-)
> >
> > diff --git a/server/reds.c b/server/reds.c
> > index 1f02553..217c74e 100644
> > --- a/server/reds.c
> > +++ b/server/reds.c
> > @@ -2856,16 +2856,7 @@ static SpiceCharDeviceState *attach_to_red_agent(SpiceCharDeviceInstance *sin)
> >      state->read_filter.discard_all = FALSE;
> >      reds->agent_state.plug_generation++;
> >
> > -    if (reds->agent_state.mig_data) {
> > -        spice_assert(reds->agent_state.plug_generation == 1);
> > -        reds_agent_state_restore(reds->agent_state.mig_data);
> > -        free(reds->agent_state.mig_data);
> > -        reds->agent_state.mig_data = NULL;
> > -    } else if (!red_channel_waits_for_migrate_data(&reds->main_channel->base)) {
> > -        /* we will assoicate the client with the char device, upon reds_on_main_agent_start,
> > -         * in response to MSGC_AGENT_START */
> > -        main_channel_push_agent_connected(reds->main_channel);
> > -    } else {
> > +    if (red_channel_waits_for_migrate_data(&reds->main_channel->base) || reds->agent_state.mig_data) {

Note the added  || reds->agent_state.mig_data here. agent_state.mig_data
will be non-NULL iff we got a MIGRATE_DATA message before calling
attach_to_red_agent(). The whole block is:

    if (red_channel_waits_for_migrate_data(&reds->main_channel->base) || reds->agent_state.mig_data) {
       spice_debug("waiting for migration data");
        if (!spice_char_device_client_exists(reds->agent_state.base, reds_get_client())) {
            int client_added;

            client_added = spice_char_device_client_add(reds->agent_state.base,

This call to spice_char_device_client_add() will cause dev->num_clients to
increase

                                                        reds_get_client(),
                                                        TRUE, /* flow control */
                                                        REDS_VDI_PORT_NUM_RECEIVE_BUFFS,
                                                        REDS_AGENT_WINDOW_SIZE,
                                                        ~0,
                                                        TRUE);

and this last argument set to TRUE is 'int wait_for_migrate_data' and its
value is assigned to dev->wait_for_migrate_data.

So thanks to the added condition, the char device is in the state
spice_char_device_state_restore() expects.

Christophe

Attachment: pgpFOShL4cIMf.pgp
Description: PGP signature

_______________________________________________
Spice-devel mailing list
Spice-devel@xxxxxxxxxxxxxxxxxxxxx
http://lists.freedesktop.org/mailman/listinfo/spice-devel

[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]     [Monitors]