Re: [PATCH 0/5] QEMU VFIO live migration

Zhao Yan <yan.y.zhao@xxxxxxxxx> · Wed, 20 Feb 2019 20:58:37 -0500



On Thu, Feb 21, 2019 at 01:35:43AM +0000, Gonglei (Arei) wrote:
> 
> 
> > -----Original Message-----
> > From: Zhao Yan [mailto:yan.y.zhao@xxxxxxxxx]
> > Sent: Thursday, February 21, 2019 8:25 AM
> > To: Gonglei (Arei) <arei.gonglei@xxxxxxxxxx>
> > Cc: alex.williamson@xxxxxxxxxx; qemu-devel@xxxxxxxxxx;
> > intel-gvt-dev@xxxxxxxxxxxxxxxxxxxxx; Zhengxiao.zx@xxxxxxxxxxxxxxx;
> > yi.l.liu@xxxxxxxxx; eskultet@xxxxxxxxxx; ziye.yang@xxxxxxxxx;
> > cohuck@xxxxxxxxxx; shuangtai.tst@xxxxxxxxxxxxxxx; dgilbert@xxxxxxxxxx;
> > zhi.a.wang@xxxxxxxxx; mlevitsk@xxxxxxxxxx; pasic@xxxxxxxxxxxxx;
> > aik@xxxxxxxxx; eauger@xxxxxxxxxx; felipe@xxxxxxxxxxx;
> > jonathan.davies@xxxxxxxxxxx; changpeng.liu@xxxxxxxxx; Ken.Xue@xxxxxxx;
> > kwankhede@xxxxxxxxxx; kevin.tian@xxxxxxxxx; cjia@xxxxxxxxxx;
> > kvm@xxxxxxxxxxxxxxx
> > Subject: Re: [PATCH 0/5] QEMU VFIO live migration
> > 
> > On Wed, Feb 20, 2019 at 11:56:01AM +0000, Gonglei (Arei) wrote:
> > > Hi yan,
> > >
> > > Thanks for your work.
> > >
> > > I have some suggestions or questions:
> > >
> > > 1) Would you add msix mode support,? if not, pls add a check in
> > vfio_pci_save_config(), likes Nvidia's solution.
> > ok.
> > 
> > > 2) We should start vfio devices before vcpu resumes, so we can't rely on vm
> > start change handler completely.
> > vfio devices is by default set to running state.
> > In the target machine, its state transition flow is running->stop->running.
> 
> That's confusing. We should start vfio devices after vfio_load_state, otherwise
> how can you keep the devices' information are the same between source side
> and destination side?
>
so, your meaning is to set device state to running in the first call to
vfio_load_state?

> > so, maybe you can ignore the stop notification in kernel?
> > > 3) We'd better support live migration rollback since have many failure
> > scenarios,
> > >  register a migration notifier is a good choice.
> > I think this patchset can also handle the failure case well.
> > if migration failure or cancelling happens,
> > in cleanup handler, LOGGING state is cleared. device state(running or
> > stopped) keeps as it is).
> 
> IIRC there're many failure paths don't calling cleanup handler.
>
could you take an example?
> > then,
> > if vm switches back to running, device state will be set to running;
> > if vm stayes at stopped state, device state is also stopped (it has no
> > meaning to let it in running state).
> > Do you think so ?
> > 
> IF the underlying state machine is complicated,
> We should tell the canceling state to vendor driver proactively.
> 
That makes sense.

> > > 4) Four memory region for live migration is too complicated IMHO.
> > one big region requires the sub-regions well padded.
> > like for the first control fields, they have to be padded to 4K.
> > the same for other data fields.
> > Otherwise, mmap simply fails, because the start-offset and size for mmap
> > both need to be PAGE aligned.
> > 
> But if we don't need use mmap for control filed and device state, they are small basically.
> The performance is enough using pread/pwrite. 
> 
we don't mmap control fields. but if data fields going immedately after
control fields (e.g. just 64 bytes), we can't mmap data fields
successfully because its start offset is 64. Therefore control fields have
to be padded to 4k to let data fields start from 4k.
That's the drawback of one big region holding both control and data fields.

> > Also, 4 regions is clearer in my view :)
> > 
> > > 5) About log sync, why not register log_global_start/stop in
> > vfio_memory_listener?
> > >
> > >
> > seems log_global_start/stop cannot be iterately called in pre-copy phase?
> > for dirty pages in system memory, it's better to transfer dirty data
> > iteratively to reduce down time, right?
> > 
> 
> We just need invoking only once for start and stop logging. Why we need to call
> them literately? See memory_listener of vhost.
> 


> Regards,
> -Gonglei