On 7/16/2024 1:30 AM, Jason Wang wrote:
On Mon, Jul 15, 2024 at 10:29 PM Steven Sistare
<steven.sistare@xxxxxxxxxx> wrote:
On 7/14/2024 10:14 PM, Jason Wang wrote:
On Fri, Jul 12, 2024 at 9:19 PM Steve Sistare <steven.sistare@xxxxxxxxxx> wrote:
Live update is a technique wherein an application saves its state, exec's
to an updated version of itself, and restores its state. Clients of the
application experience a brief suspension of service, on the order of
100's of milliseconds, but are otherwise unaffected.
Define and implement interfaces that allow vdpa devices to be preserved
across fork or exec, to support live update for applications such as QEMU.
The device must be suspended during the update, but its DMA mappings are
preserved, so the suspension is brief.
The VHOST_NEW_OWNER ioctl transfers device ownership and pinned memory
accounting from one process to another.
The VHOST_BACKEND_F_NEW_OWNER backend capability indicates that
VHOST_NEW_OWNER is supported.
The VHOST_IOTLB_REMAP message type updates a DMA mapping with its userland
address in the new process.
The VHOST_BACKEND_F_IOTLB_REMAP backend capability indicates that
VHOST_IOTLB_REMAP is supported and required. Some devices do not
require it, because the userland address of each DMA mapping is discarded
after being translated to a physical address.
Here is a pseudo-code sequence for performing live update, based on
suspend + reset because resume is not yet widely available. The vdpa device
descriptor, fd, remains open across the exec.
ioctl(fd, VHOST_VDPA_SUSPEND)
ioctl(fd, VHOST_VDPA_SET_STATUS, 0)
I don't understand why we need a reset after suspend, it looks to me
the previous suspend became meaningless.
The suspend guarantees completion of in-progress DMA. At least, that is
my interpretation of why that is done for live migration in QEMU, which
also does suspend + reset + re-create. I am following the live migration
model.
Yes, but any reason we need a reset after the suspension?
Probably not. I found it cleanest to call reset and let new qemu configure the
device as it always does during startup, rather than altering those code paths
to skip the kernel calls. So, consider this to be just one of several possible
userland algorithms.
- Steve
exec
ioctl(fd, VHOST_NEW_OWNER)
issue ioctls to re-create vrings
if VHOST_BACKEND_F_IOTLB_REMAP
So the idea is for a device that is using a virtual address, it
doesn't need VHOST_BACKEND_F_IOTLB_REMAP at all?
Actually the reverse: if the device translates virtual to physical when
the mappings are created, and discards the virtual, then VHOST_IOTLB_REMAP
is not needed.
Ok.
foreach dma mapping
write(fd, {VHOST_IOTLB_REMAP, new_addr})
ioctl(fd, VHOST_VDPA_SET_STATUS,
ACKNOWLEDGE | DRIVER | FEATURES_OK | DRIVER_OK)
From API level, this seems to be asymmetric as we have suspending but
not resuming?
Again, I am just following the path taken by live migration.
I will be happy to use resume when the devices and QEMU support it.
The decision to use reset vs resume should not affect the definition
and use of VHOST_NEW_OWNER and VHOST_IOTLB_REMAP.
- Steve
Thanks