On 7/12/2024 9:18 AM, Steve Sistare wrote:
Live update is a technique wherein an application saves its state, exec's to an updated version of itself, and restores its state. Clients of the application experience a brief suspension of service, on the order of 100's of milliseconds, but are otherwise unaffected. Define and implement interfaces that allow vdpa devices to be preserved across fork or exec, to support live update for applications such as QEMU. The device must be suspended during the update, but its DMA mappings are preserved, so the suspension is brief. The VHOST_NEW_OWNER ioctl transfers device ownership and pinned memory accounting from one process to another. The VHOST_BACKEND_F_NEW_OWNER backend capability indicates that VHOST_NEW_OWNER is supported. The VHOST_IOTLB_REMAP message type updates a DMA mapping with its userland address in the new process. The VHOST_BACKEND_F_IOTLB_REMAP backend capability indicates that VHOST_IOTLB_REMAP is supported and required. Some devices do not require it, because the userland address of each DMA mapping is discarded after being translated to a physical address. Here is a pseudo-code sequence for performing live update, based on suspend + reset because resume is not yet widely available. The vdpa device descriptor, fd, remains open across the exec. ioctl(fd, VHOST_VDPA_SUSPEND) ioctl(fd, VHOST_VDPA_SET_STATUS, 0) exec ioctl(fd, VHOST_NEW_OWNER) issue ioctls to re-create vrings if VHOST_BACKEND_F_IOTLB_REMAP foreach dma mapping write(fd, {VHOST_IOTLB_REMAP, new_addr}) ioctl(fd, VHOST_VDPA_SET_STATUS, ACKNOWLEDGE | DRIVER | FEATURES_OK | DRIVER_OK) This is faster than VHOST_RESET_OWNER + VHOST_SET_OWNER + VHOST_IOTLB_UPDATE, as that would would unpin and repin physical pages, which would cost multiple seconds for large memories. This is implemented in QEMU by the patch series "Live update: vdpa" https://lore.kernel.org/qemu-devel/TBD (reference to be posted shortly)
https://lore.kernel.org/qemu-devel/1720792931-456433-3-git-send-email-steven.sistare@xxxxxxxxxx
The QEMU implementation leverages the live migration code path, but after CPR exec's new QEMU: - vhost_vdpa_set_owner() calls VHOST_NEW_OWNER instead of VHOST_SET_OWNER - vhost_vdpa_dma_map() sets type VHOST_IOTLB_REMAP instead of VHOST_IOTLB_UPDATE Changes in V2: - clean up handling of set_map vs dma_map vs platform iommu in remap - augment and clarify commit messages and comments Steve Sistare (7): vhost-vdpa: count pinned memory vhost-vdpa: pass mm to bind vhost-vdpa: VHOST_NEW_OWNER vhost-vdpa: VHOST_BACKEND_F_NEW_OWNER vhost-vdpa: VHOST_IOTLB_REMAP vhost-vdpa: VHOST_BACKEND_F_IOTLB_REMAP vdpa/mlx5: new owner capability drivers/vdpa/mlx5/net/mlx5_vnet.c | 3 +- drivers/vhost/vdpa.c | 125 ++++++++++++++++++++++++++++-- drivers/vhost/vhost.c | 15 ++++ drivers/vhost/vhost.h | 1 + include/uapi/linux/vhost.h | 10 +++ include/uapi/linux/vhost_types.h | 15 +++- 6 files changed, 161 insertions(+), 8 deletions(-)