Re: [PATCH] migration: Don't assert() if MIGRATE_DATA comes before attaching the agent

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

On Wed, Apr 30, 2014 at 08:11:38PM +0300, Uri Lublin wrote:
> Hi Christophe,
> 
> I don't understand what exactly goes wrong and how this
> patch fixes the race.
> 
> 
> On 04/01/2014 04:58 PM, Christophe Fergeau wrote:
> >During seamless migration, after switching host, if a client was connected
> >during the migration, it will have data to send back to the new
> >qemu/spice-server instance. This is handled through MIGRATE_DATA messages.
> >SPICE char devices use such MIGRATE_DATA messages to restore their state.
> >
> >However, the MIGRATE_DATA message can arrive any time after the new qemu
> >instance has started, this can happen before or after the SPICE char
> >devices have been created. In order to handle this, if the migrate data
> >arrives early, it's stored in reds->agent_state.mig_data, and
> >attach_to_red_agent() will restore the agent state as appropriate.
> 
> Adding more information -- I think the race here is between qemu-kvm
> migrating the virtio-device state (with main/migration thread) vs spice
> migrating
> "agent" channel state.

What do you call 'spice migrating "agent"" channel state" ? Is it the
"MIGRATE DATA" spice message I I mention or something else?

> 
> The virtio-device state load function, calls (if agent service is running in
> the guest)
>   virtio_serial_load ->
>    qemu_char_fe_open ->
>      spice_chr_guest_open ->
>        vmc_register_interface ->
>          qemu_spice_add_interface ->
>            spice_server_add_interface ->
>              spice_server_char_device_add_interface ->
>                attach_to_red_agent
> 
> >Unfortunately this does not work as expected as expected. If
> >attach_to_red_agent() is called before the MIGRATE_DATA message reaches the
> >server, all goes well, but if MIGRATE_DATA reaches the server before
> >attach_to_red_agent() gets called, then some assert() get triggered in
> >spice_char_device_state_restore():
> >
> >((null):32507): Spice-ERROR **: char_device.c:937:spice_char_device_state_restore: assertion `dev->num_clients == 1 && dev->wait_for_migrate_data' failed
> >Thread 3 (Thread 0x7f406b543700 (LWP 32543)):
> >Thread 2 (Thread 0x7f40697ff700 (LWP 32586)):
> >Thread 1 (Thread 0x7f4079b45a40 (LWP 32507)):
> 
> If we split this into 2 asserts, one for num_clients and the other to
> wait_for_migrate_data, we'll know which one's at fault.

Both were at fault. (dev->num_client is 0 and dev->wait_for_migrate_data
is FALSE).

> 
> >What happens is that dev->wait_for_migrate_data is unconditionally cleared when
> >completing handling of the MIGRATE_DATA message, so it's FALSE when
> >spice_char_device_state_restore() is called.
> 
> dev->wait_for_migrate_data is cleared upon
> client_remove/device_reset/state_restore and in state_restore it's
> only after it passed the assert mentioned above.

The race happens when we receive a MIGRATE_DATA message from the client before the codepath you describe above is hit. reds_handle_migrate_data() will get called, and this will clear dev->wait_for_migrate_data. I forgot which codepaths were triggered though when the bug occurs, but this ends up clearing wait_for_migrate_data. I can look closer at this codepath if you want after reproducing.

> 
> >  Moreover, dev->num_clients is
> >also 0 as this is only increased by spice_char_device_start() which
> >spice_server_char_device_add_interface() calls after
> >attach_to_red_agent()/spice_char_device_state_restore() have been called.
> 
> dev->num_clients is set to 1 in spice_char_device_client_add() called
> from  spicevmc_connect. So as long as the client did not disconnect
> it should be good.
> With seamless migration that happens even before (qemu-kvm)
> migration starts.

Once again I will have to check exactly what's going on as
dev->num_clients is definitely 0 when attach_to_red_agent() gets called.

> 
> >This commit changes the logic in spice_server_char_device_add_interface(),
> >and when there is migrate data pending in reds->agent_state.mig_data, we
> >only attempt to restore it after successfully initializing/starting the
> >needed char device (which will ensure that dev->num_clients is not 0).
> 
> I think that if the client is connected we should have dev->num_clients set
> to 1.

Ah, if this is the real bug, I'll have to look again at all that ;) When
I hit the bug, there is no client attached when attach_to_red_agent is
called.

Christophe
_______________________________________________
Spice-devel mailing list
Spice-devel@xxxxxxxxxxxxxxxxxxxxx
http://lists.freedesktop.org/mailman/listinfo/spice-devel




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]     [Monitors]