Re: [PATCH V1 vfio 6/7] vfio/virtio: Add PRE_COPY support for live migration

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 6 Nov 2024 13:16:34 +0200
Yishai Hadas <yishaih@xxxxxxxxxx> wrote:

> On 06/11/2024 1:18, Alex Williamson wrote:
> > On Mon, 4 Nov 2024 12:21:30 +0200
> > Yishai Hadas <yishaih@xxxxxxxxxx> wrote:
> >   
> >> Add PRE_COPY support for live migration.
> >>
> >> This functionality may reduce the downtime upon STOP_COPY as of letting
> >> the target machine to get some 'initial data' from the source once the
> >> machine is still in its RUNNING state and let it prepares itself
> >> pre-ahead to get the final STOP_COPY data.
> >>
> >> As the Virtio specification does not support reading partial or
> >> incremental device contexts. This means that during the PRE_COPY state,
> >> the vfio-virtio driver reads the full device state.
> >>
> >> As the device state can be changed and the benefit is highest when the
> >> pre copy data closely matches the final data we read it in a rate
> >> limiter mode and reporting no data available for some time interval
> >> after the previous call.
> >>
> >> With PRE_COPY enabled, we observed a downtime reduction of approximately
> >> 70-75% in various scenarios compared to when PRE_COPY was disabled,
> >> while keeping the total migration time nearly the same.
> >>
> >> Signed-off-by: Yishai Hadas <yishaih@xxxxxxxxxx>
> >> ---
> >>   drivers/vfio/pci/virtio/common.h  |   4 +
> >>   drivers/vfio/pci/virtio/migrate.c | 233 +++++++++++++++++++++++++++++-
> >>   2 files changed, 229 insertions(+), 8 deletions(-)
> >>
> >> diff --git a/drivers/vfio/pci/virtio/common.h b/drivers/vfio/pci/virtio/common.h
> >> index 3bdfb3ea1174..5704603f0f9d 100644
> >> --- a/drivers/vfio/pci/virtio/common.h
> >> +++ b/drivers/vfio/pci/virtio/common.h
> >> @@ -10,6 +10,8 @@
> >>   
> >>   enum virtiovf_migf_state {
> >>   	VIRTIOVF_MIGF_STATE_ERROR = 1,
> >> +	VIRTIOVF_MIGF_STATE_PRECOPY = 2,
> >> +	VIRTIOVF_MIGF_STATE_COMPLETE = 3,
> >>   };
> >>   
> >>   enum virtiovf_load_state {
> >> @@ -57,6 +59,8 @@ struct virtiovf_migration_file {
> >>   	/* synchronize access to the file state */
> >>   	struct mutex lock;
> >>   	loff_t max_pos;
> >> +	u64 pre_copy_initial_bytes;
> >> +	struct ratelimit_state pre_copy_rl_state;
> >>   	u64 record_size;
> >>   	u32 record_tag;
> >>   	u8 has_obj_id:1;
> >> diff --git a/drivers/vfio/pci/virtio/migrate.c b/drivers/vfio/pci/virtio/migrate.c
> >> index 2a9614c2ef07..cdb252f6fd80 100644
> >> --- a/drivers/vfio/pci/virtio/migrate.c
> >> +++ b/drivers/vfio/pci/virtio/migrate.c  
> > ...  
> >> @@ -379,9 +432,104 @@ static ssize_t virtiovf_save_read(struct file *filp, char __user *buf, size_t le
> >>   	return done;
> >>   }
> >>   
> >> +static long virtiovf_precopy_ioctl(struct file *filp, unsigned int cmd,
> >> +				   unsigned long arg)
> >> +{
> >> +	struct virtiovf_migration_file *migf = filp->private_data;
> >> +	struct virtiovf_pci_core_device *virtvdev = migf->virtvdev;
> >> +	struct vfio_precopy_info info = {};
> >> +	loff_t *pos = &filp->f_pos;
> >> +	bool end_of_data = false;
> >> +	unsigned long minsz;
> >> +	u32 ctx_size;
> >> +	int ret;
> >> +
> >> +	if (cmd != VFIO_MIG_GET_PRECOPY_INFO)
> >> +		return -ENOTTY;
> >> +
> >> +	minsz = offsetofend(struct vfio_precopy_info, dirty_bytes);
> >> +	if (copy_from_user(&info, (void __user *)arg, minsz))
> >> +		return -EFAULT;
> >> +
> >> +	if (info.argsz < minsz)
> >> +		return -EINVAL;
> >> +
> >> +	mutex_lock(&virtvdev->state_mutex);
> >> +	if (virtvdev->mig_state != VFIO_DEVICE_STATE_PRE_COPY &&
> >> +	    virtvdev->mig_state != VFIO_DEVICE_STATE_PRE_COPY_P2P) {
> >> +		ret = -EINVAL;
> >> +		goto err_state_unlock;
> >> +	}
> >> +
> >> +	/*
> >> +	 * The virtio specification does not include a PRE_COPY concept.
> >> +	 * Since we can expect the data to remain the same for a certain period,
> >> +	 * we use a rate limiter mechanism before making a call to the device.
> >> +	 */
> >> +	if (!__ratelimit(&migf->pre_copy_rl_state)) {
> >> +		/* Reporting no data available */
> >> +		ret = 0;
> >> +		goto done;  
> > 
> > @ret is not used by the done: goto target.  Don't we need to zero dirty
> > bytes, or account for initial bytes not being fully read yet?  
> 
> The 'dirty bytes' are actually zero as we used the below line [1] of 
> code above.
> 
> [1] struct vfio_precopy_info info = {};

Ah, missed that.
 
> However, I agree, we may better account for the 'initial bytes' which 
> potentially might not being fully read yet.
> Same can be true for returning the actual 'dirty bytes' that we may have 
> from the previous call.
> 
> So, in V2 I'll change the logic to initially set:
> u32 ctx_size = 0;
> 
> Then, will call the device to get its 'ctx_size' only if time has passed 
> according to the rate limiter.
> 
> Something as of the below.
> 
> if (__ratelimit(&migf->pre_copy_rl_state)) {
> 	ret = virtio_pci_admin_dev_parts_metadata_get(.., &ctx_size);
> 	if (ret)
> 		goto err_state_unlock;
> }
> 
>  From that point the function will proceed with its V1 flow to return 
> the actual 'initial bytes' and 'dirty_bytes' while considering the extra 
> context size from the device to be 0.

That appears correct to me.  Thanks,

Alex





[Index of Archives]     [KVM Development]     [Libvirt Development]     [Libvirt Users]     [CentOS Virtualization]     [Netdev]     [Ethernet Bridging]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux