Michael pointed out that the virtio-vsock draft specification does not address live migration and in fact currently precludes migration. Migration is fundamental so the device specification at least mustn't preclude it. Having brainstormed migration with Matthew Benjamin and Michael Tsirkin, I am now summarizing the approach that I want to include in the next draft specification. Feedback and comments welcome! In the meantime I will implement this in code and update the draft specification. 1. Requirements Virtio-vsock is a new AF_VSOCK transport. As such, it should provide at least the same guarantees as the existing AF_VSOCK VMCI transport. This is for consistency and to allow code reuse across any AF_VSOCK transport. Virtio-vsock aims to replace virtio-serial by providing the same guest/host communication ability but with sockets API semantics that are more popular and convenient for application developers. Therefore virtio-vsock migration should provide at least the same level of migration functionality as virtio-serial. Ideally it should be possible to migrate applications using AF_VSOCK together with the virtual machine so that guest<->host communication is interrupted. Neither AF_VSOCK VMCI nor virtio-serial support this today. 2. Basic disruptive migration flow When the virtual machine migrates from the source host to the destination host, the guest's CID may change. The CID namespace is host-wide so other hosts may have CID collisions and allocate a new CID for incoming migration VMs. The device notifies the guest that the CID has changed. Guest sockets are affected as follows: * Established connections are reset (ECONNRESET) and the guest application will have to reconnect. * Listen sockets remain open. The only thing to note is that connections from the host are now made to the new CID. This means the local address of the listen socket is automatically updated to the new CID. * Sockets in other states are unchanged. Applications must handle disruptive migration by reconnecting if necessary after ECONNRESET. 3. Checkpoint/restore for seamless migration Applications that wish to communicate across live migration can do so but this requires extra application-specific checkpoint/restore code. This is similar to the approach taken by the CRIU project where getsockopt()/setsockopt() is used to migrate socket state. The difference is that the application process is not automatically migrated from the source host to the destination host. Therefore, the application needs to migrate its own state somehow. The flow is as follows: The application on the source host must quiesce (stop sending/receiving) and use getsockopt() to extract socket state information from the host kernel. A new instance of the application is started on the destination host and given the state so it can restore the connection. The setsockopt() syscall is used to restore socket state information. The guest is given a list of <host_old_cid, host_new_cid, host_port, guest_port> tuples for established connections that must not be reset when the guest CID update notification is received. These connections will carry on as if nothing changed. Note that the connection's remote address is updated from host_old_cid to host_new_cid. This allows remapping of CIDs (if necessary). Typically this will be unused because the host always has well-known CID 2. In a guest<->guest scenario it may be used to remap CIDs. For the time being I am focussing on the basic disruptive migration flow only. Checkpoint/restore can be added with a feature bit in the future. It is a lot more complex and I'm not sure whether there will be any users yet. Stefan
Attachment:
signature.asc
Description: PGP signature
_______________________________________________ Virtualization mailing list Virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/virtualization