Re: [PATCH] migration: Don't assert() if MIGRATE_DATA comes before attaching the agent

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Apr 30, 2014 at 08:11:38PM +0300, Uri Lublin wrote:
> Hi Christophe,
> 
> I don't understand what exactly goes wrong and how this
> patch fixes the race.
> 
> 
> On 04/01/2014 04:58 PM, Christophe Fergeau wrote:
> >During seamless migration, after switching host, if a client was connected
> >during the migration, it will have data to send back to the new
> >qemu/spice-server instance. This is handled through MIGRATE_DATA messages.
> >SPICE char devices use such MIGRATE_DATA messages to restore their state.
> >
> >However, the MIGRATE_DATA message can arrive any time after the new qemu
> >instance has started, this can happen before or after the SPICE char
> >devices have been created. In order to handle this, if the migrate data
> >arrives early, it's stored in reds->agent_state.mig_data, and
> >attach_to_red_agent() will restore the agent state as appropriate.
> 
> Adding more information -- I think the race here is between qemu-kvm
> migrating the virtio-device state (with main/migration thread) vs spice
> migrating
> "agent" channel state.
> 
> The virtio-device state load function, calls (if agent service is running in
> the guest)
>   virtio_serial_load ->
>    qemu_char_fe_open ->
>      spice_chr_guest_open ->

With recent QEMUs, you have to insert fetch_active_ports_list here,
which schedules a call to virtio_serial_post_load_timer_cb through a
timer. Once that callback is invoked, vmc_register_interface will get
called. And between fetch_active_ports_list returning and
virtio_serial_post_load_timer_cb being triggered, it's possible to
receive and process a MIGRATE_DATA message. When this happens, things go
bad.

>        vmc_register_interface ->
>          qemu_spice_add_interface ->
>            spice_server_add_interface ->
>              spice_server_char_device_add_interface ->
>                attach_to_red_agent
> 
> >Unfortunately this does not work as expected as expected. If
> >attach_to_red_agent() is called before the MIGRATE_DATA message reaches the
> >server, all goes well, but if MIGRATE_DATA reaches the server before
> >attach_to_red_agent() gets called, then some assert() get triggered in
> >spice_char_device_state_restore():
> >
> >((null):32507): Spice-ERROR **: char_device.c:937:spice_char_device_state_restore: assertion `dev->num_clients == 1 && dev->wait_for_migrate_data' failed
> >Thread 3 (Thread 0x7f406b543700 (LWP 32543)):
> >Thread 2 (Thread 0x7f40697ff700 (LWP 32586)):
> >Thread 1 (Thread 0x7f4079b45a40 (LWP 32507)):
> 
> If we split this into 2 asserts, one for num_clients and the other to
> wait_for_migrate_data, we'll know which one's at fault.

Both conditions are false.

> 
> >What happens is that dev->wait_for_migrate_data is unconditionally cleared when
> >completing handling of the MIGRATE_DATA message, so it's FALSE when
> >spice_char_device_state_restore() is called.
> 
> dev->wait_for_migrate_data is cleared upon
> client_remove/device_reset/state_restore and in state_restore it's
> only after it passed the assert mentioned above.

It's possible to receive a MIGRATE_DATA message before going through that assert (scenario described at the beginning of this email). When this happens, wait_for_migrate_data gets cleared after processing the MIGRATE_DATA message.

> 
> >  Moreover, dev->num_clients is
> >also 0 as this is only increased by spice_char_device_start() which
> >spice_server_char_device_add_interface() calls after
> >attach_to_red_agent()/spice_char_device_state_restore() have been called.
> 
> dev->num_clients is set to 1 in spice_char_device_client_add() called
> from  spicevmc_connect. So as long as the client did not disconnect
> it should be good.

spicevmc_connect is used for vmc devices with a dedicated channel (usb
and port), it's not called for the agent which is 'muxed' with the main
channel. spicevmc_connect will get called when spicevmc_device_connect
has been called, which only happens for usbredir and port channels, see
spice_server_char_device_add_interface()

> With seamless migration that happens even before (qemu-kvm)
> migration starts.
> 
> >This commit changes the logic in spice_server_char_device_add_interface(),
> >and when there is migrate data pending in reds->agent_state.mig_data, we
> >only attempt to restore it after successfully initializing/starting the
> >needed char device (which will ensure that dev->num_clients is not 0).
> 
> I think that if the client is connected we should have dev->num_clients set
> to 1.

It's always 0 here, even when migration succeeds.

> 
> >
> >It also changes attach_to_red_agent() to handle the case when
> >reds->agent_state.mig_data is non-NULL to be the same as the case when
> >red_channel_waits_for_migrate_data() is TRUE. This ensures that
> >spice_char_device_client_add() gets called and that 'wait_for_data' is set
> >in the added device.
> 
> I think each handles a different scenario in that race:
> reds->agent_state.mig_data -- migration data has already arrived.
> spice_char_waits_for_migarte_data -- we are waiting for migration data
> to arrive.

Yup, and the current code does not handle correctly the 'migration data
has already arrived' case, it's only able to handle the 'waiting for
migration data to arrive' case. This patch attempts to fix the
non-working case.

> >@@ -2987,6 +2982,13 @@ static int spice_server_char_device_add_interface(SpiceServer *s,
> >          } else {
> >              dev_state = spicevmc_device_connect(char_device, SPICE_CHANNEL_PORT);
> >          }
> >+    } else if (strcmp(char_device->subtype, SUBTYPE_VDAGENT) == 0) {
> >+        if (reds->agent_state.mig_data) {
> >+               spice_assert(reds->agent_state.plug_generation == 1);
> >+               reds_agent_state_restore(reds->agent_state.mig_data);
> >+               free(reds->agent_state.mig_data);
> >+               reds->agent_state.mig_data = NULL;
> >+        }
> >      }
> 
> Since this if statement started with
>    if (strcmp(char_device->subtype, SUBTYPE_VDAGENT) == 0)
> and was followed by else if statements the lines added here will not be
> reached.

Oops, thanks, I'll change it!

Christophe

Attachment: pgpQxllf_JOJo.pgp
Description: PGP signature

_______________________________________________
Spice-devel mailing list
Spice-devel@xxxxxxxxxxxxxxxxxxxxx
http://lists.freedesktop.org/mailman/listinfo/spice-devel

[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]     [Monitors]