rfc seamless migration

Yonit Halperin <yhalperi@xxxxxxxxxx> · Sun, 10 Jun 2012 12:05:26 +0300

Hi,

As the qemu team rejected integrating spice connection migration in qemu 
migration process, we remain with a solution that will involve libvirt, 
and passing data from the src to the target via the client. Before I 
continue with the implementation I'd like to hear your comments on the 
details:

Here is a reminder about the problems we face:
(1) Loss of data: we would like the client to continue the connection 
from the same point the vm was stopped. For example, we want any 
usb/smartcard devices to stay attached, and we don't want to lose any 
data that was sent from the client to the vm, or partial data that was 
read from a device, but hasn't reached its destination before migration.

(2) The qemu process in the src side can be closed by libvirt as soon as 
the migration state changes to "completed". Thus, we can't reliably pass 
any data between the src server and the client after migration has 
completed.

These problems can be addressed by the following:
Add a qmp event for spice migration completion. libvirt will need to 
wait not only for qemu migration completion, but also for this qmp 
event, before it closes the src qemu.
Spice is required to know whether libvirt supports this, or not, in 
order to decide which migration approach to take (semi or seamless). For 
this aim, we will add a new parameter to the spice configuration in the 
qemu command line (e.g., seamless-migration=on), and if it is set by 
libvirt we can assume libvirt will wait for spice migration.
After qemu migration is completed, the src server will pass migration 
data to the target via the client/s. When the clients disconnect from 
the src and switch completely to the target, we send the new qmp event.

migration data transfer
=======================
Our historical MSG_MIGRATE pathway, provides support for sending all 
pending outgoing data from the client to the server, and vice-versa, 
before we fill the migration data.
Each channel defines its own migration data.
(1) MSG_MIGRATE is the last message that is sent from the src server 
channel to the client, before MIGRATE_DATA.
(2) If the messages flags have MIGRATE_NEED_FLUSH, the client write all 
its outgoing data, and then sends FLUSH to the server. (3) Then the client
channel waits for MIGRATE_DATA message, and does nothing besides that. 
(4) When it receives the message, it switches to the target completely 
and passes it the migration data.

(1) server channel--->MSG_MIGRATE...in-flight messages--->client
(2) client channel-->MSGC_FLUSH_MARK...in-flight messages-->server
(3) server channel-->MSG_MIGRATE_DATA-->client
(4) client channel-->MSGC_MIGRATE_DATA-->target server

Obligatory migration data:
-------------------------
(1) agent/spicevmc/smartcard write buffer. i.e., data that reached the 
server after savevm, and thus was not written to the device.
Currently, spicevmc and smartcard do not have write buffer, but since 
buffers can reach the server after savevm, they should have one. I'm not 
sure if even today they should attempt to write to the guest if it is 
stopped. The agent code also can write to the guest even if it is 
stopped; I think it is a bug.
(2) agent/smartcard partial data that had been read from the device and 
wasn't sent to the client since its reading hasn't completed.
Currently we don't have such data for spicevmc, because we push to the 
client any amount of data we read. In the future we might want to 
control the rate and the size of data we send/receive, and then we will 
have outgoing buffer.

Optional migration data:
--------------
- primary surface lossy region(*), or its extents
If we don't send it to the client, and jpeg is enabled, we will need to 
resend the primary surface after migration, or set the lossy region to 
the whole surface, and then each non opaque rendering operation that 
involves the surface, will require resending parts of it losslessly.
- list of off-screen surfaces ids that have been sent to the client, and 
their lossy region.
By keeping this data we will avoid on-demand resending surfaces  that 
already exist on the client side.
- bitmaps cache - list of bitmaps ids + some internal cache information 
for each bitmap.
- active video streams: ids, destination box, etc.
- session bandwidth (low/high): we don't want to perform the main 
channel net test after the migration is completed, because it can take 
time (we can't do it during the migration because the main loop is not 
available). So we assume the bandwidth classification will stay the 
same. When we will have a dynamic monitoring of bandwidth, we can drop this.

Though the above data is optional, part of it is important for avoiding 
a slow start of the connection to target (e.g., sending the primary 
lossy region, in order to avoid resending parts of it).

In addition, if we wish to keep the client channels state the same, and 
not require them (1) to send initialization data to the server, and (2) 
to reset part of their state, we should also migrate other server state 
details, like:
- the serial of the last message sent from the display channel
- main channel agent data tokens state
- size of the images cache (this is usually set by the client upon new 
connection).
Including such information in the migration data will allow us to keep 
the migration logic in the server. The alternative will be that the 
client will reset part of its state after migration, either by self 
initiative, or by specific messages sent from the server (it may require 
new set of messages).

(*) lossy-region=the region on the surface that contains bitmaps that 
were compressed using jpeg

Transparency of migration data:
------------------------------
I think that the migration data shouldn't be part of spice protocol, and 
that it should be opaque to the client, for the following reasons:
(a) The client is only a mediator, and it has nothing to do with the 
data content.
(b) If the migration data of each channel is part of spice protocol, 
every minor change to the migration data of one channel, will require a 
new message and capability, and will make the support in migration 
backward compatibility more cumbersome, as it will involve the client as 
well.  Moreover, If the client supports only migration data of ver x, 
and the src and target both support ver x+1, we will suffer from data loss.
(c) As for security issues, I don't think that it should raise a problem 
since the client is trusted by both the src and the target.

version negotiation:
-------------------
We need to have some negotiation between the servers in order to decide 
if to execute seamless migration, or the older migration.

This is what I had in mind:
We will add capability for seamless migration, and each channel's 
migration data will have a version.
When the client establishes the initial connection to the target (upon 
client_migrate_info), it mediates the negotiation:
(1) The client checks if the target is capable of seamless migration
(2) The client will retrieve from each of the target channels its 
migration data version. It will send the src server these versions as 
part of the MSGC_MIGRATE_CONNECT
(3) The src server will compare the versions to its own ones. If for all 
the channels (src-ver <= target-ver), it will take the seamless 
migration pathway. Otherwise, it will fallback to the older method.

multi client
------------
- each client specific data will be sent via its corresponding client.
- non client specific data: I thought about sending it via one of the 
channel clients (the primary one, if we will have such notion). We will 
have a flag for indicating this in the migration data. On the target 
side, we will have a timeout for receiving the shared data, and if the 
timeout expires, we will disconnect all the clients.
In the future we can choose the channel with the highest bandwidth and 
lower latency for transferring the data.  We may also have group of 
clients, and group specific data, which the "primary" client of each 
group will be responsible to transfer.

connection id
-------------
It will be nice to have the target verify that a migrated client is 
indeed a client that was connected to the src.
Currently a completely new client connects with connection-id=0 and the 
server sets the connection-id to be a random number.
A migrated client connects to the target with the connection-id that was 
set by the src.

SpiceChannelEventInfo contains the connection id. We could have saved
and load the list of connection ids using a vmstate.

Cheers,
Yonit.
_______________________________________________
Spice-devel mailing list
Spice-devel@xxxxxxxxxxxxxxxxxxxxx
http://lists.freedesktop.org/mailman/listinfo/spice-devel