> -----Original Message----- > From: Jason Gunthorpe [mailto:jgg@xxxxxxxxxx] > Sent: 02 February 2022 13:15 > To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@xxxxxxxxxx> > Cc: kvm@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; > linux-crypto@xxxxxxxxxxxxxxx; alex.williamson@xxxxxxxxxx; > mgurtovoy@xxxxxxxxxx; Linuxarm <linuxarm@xxxxxxxxxx>; liulongfang > <liulongfang@xxxxxxxxxx>; Zengtao (B) <prime.zeng@xxxxxxxxxxxxx>; > yuzenghui <yuzenghui@xxxxxxxxxx>; Jonathan Cameron > <jonathan.cameron@xxxxxxxxxx>; Wangzhou (B) <wangzhou1@xxxxxxxxxxxxx> > Subject: Re: [RFC v2 0/4] vfio/hisilicon: add acc live migration driver > > On Fri, Jul 02, 2021 at 10:58:45AM +0100, Shameer Kolothum wrote: > > This series attempts to add vfio live migration support for > > HiSilicon ACC VF devices. HiSilicon ACC VF device MMIO space > > includes both the functional register space and migration > > control register space. As discussed in RFCv1[0], this may create > > security issues as these regions get shared between the Guest > > driver and the migration driver. Based on the feedback, we tried > > to address those concerns in this version. > > > > This is now based on the new vfio-pci-core framework proposal[1]. > > Understand that the framework proposal is still under discussion, > > but really appreciate any feedback on the approach taken here > > to mitigate the security risks. > > Hi, can you look at the v6 proposal for the mlx5 implementation of the > migration API and see if it meets hisilicon acc's needs as well? > > https://lore.kernel.org/all/20220130160826.32449-1-yishaih@xxxxxxxxxx/ Yes, I saw that one. Thanks for that and is now looking into it. > > There are few topics to consider: > - Which of the three feature sets (STOP_COPY, P2P and PRECOPY) make > sense for this driver? I think it will be STOP_COPY only for now. We might have PRECOPY feature once we have the SMMUv3 HTTU support in future. > > I see pf_qm_state_pre_save() but didn't understand why it wanted to > send the first 32 bytes in the PRECOPY mode? It is fine, but it > will add some complexity to continue to do this. That was mainly to do a quick verification between src and dst compatibility before we start saving the state. I think probably we can delay that check for later. > - I think we discussed the P2P implementation and decided it would > work for this device? Can you re-read and confirm? In our case these devices are Integrated End Point devices and doesn't have P2P DMA capability. Hence the FSM arcs will be limited to STOP_COPY feature I guess. Also, since we cannot guarantee a NDMA state in STOP, my assumption currently is the onus of making sure that no MMIO access happens in STOP is on the user. Is that a valid assumption? > - Are the arcs we defined going to work here as well? The current > implementation in hisi_acc_vf_set_device_state() is very far away > from what the v1 protocol is, so I'm having a hard time guessing, > but.. Right. The FSM has changed a couple of times since we posted this. I am going to rebase all that now. > RESUMING -> STOP > Probably vf_qm_state_resume() > > RUNNING -> STOP > vf_qm_fun_restart() - that is oddly named.. > > STOP -> RESUMING > Seems to be a nop (likely a bug) > > STOP -> RUNNING > Not implemented currenty? (also a bug) > > STOP -> STOP_COPY > pf_qm_state_pre_save / vf_qm_state_save > > STOP_COPY -> STOP > NOP I will check and verify this. > And the modification for the P2P/NO DMA is presumably just > fun_restart too since stopping the device and stopping DMA are > going to be the same thing here? Yes, in our case stopping device and stopping DMA are effectively the same thing. > > The mlx5 implementation linked above is a full example you can cut and > paste from for how to implement the state function and the how to do > the data transfer. The f_ops read/write implementation for acc looks > trivial as it only streams the fixed size and pre-allocated 'struct > acc_vf_data' > > It looks like it would be a short path to implement our v2 proposal > and remove a lot of driver code, as we saw in mlx5. > Ok. These are the git repo I am using for the rework, https://github.com/jgunthorpe/qemu/commits/vfio_migration_v2 https://github.com/jgunthorpe/linux/tree/vfio_migration_v2 Please let me know if the above are not up to date. Also, just noted that my quick prototype is now failing with below error, " Error: VFIO device doesn't support migration" Do we need to set the below before the feature query? Or am I using a wrong Qemu/kernel repo? --- a/hw/vfio/migration.c +++ b/hw/vfio/migration.c @@ -488,6 +488,7 @@ static int vfio_migration_query_flags(VFIODevice *vbasedev, uint64_t *mig_flags) struct vfio_device_feature_migration *mig = (void *)feature->data; feature->argsz = sizeof(buf); + feature->flags = VFIO_DEVICE_FEATURE_MIGRATION | VFIO_DEVICE_FEATURE_GET; if (ioctl(vbasedev->fd, VFIO_DEVICE_FEATURE, feature) != 0) return -EOPNOTSUPP; Thanks, Shameer