On Fri, Jul 02, 2021 at 10:58:45AM +0100, Shameer Kolothum wrote: > This series attempts to add vfio live migration support for > HiSilicon ACC VF devices. HiSilicon ACC VF device MMIO space > includes both the functional register space and migration > control register space. As discussed in RFCv1[0], this may create > security issues as these regions get shared between the Guest > driver and the migration driver. Based on the feedback, we tried > to address those concerns in this version. > > This is now based on the new vfio-pci-core framework proposal[1]. > Understand that the framework proposal is still under discussion, > but really appreciate any feedback on the approach taken here > to mitigate the security risks. Hi, can you look at the v6 proposal for the mlx5 implementation of the migration API and see if it meets hisilicon acc's needs as well? https://lore.kernel.org/all/20220130160826.32449-1-yishaih@xxxxxxxxxx/ There are few topics to consider: - Which of the three feature sets (STOP_COPY, P2P and PRECOPY) make sense for this driver? I see pf_qm_state_pre_save() but didn't understand why it wanted to send the first 32 bytes in the PRECOPY mode? It is fine, but it will add some complexity to continue to do this. - I think we discussed the P2P implementation and decided it would work for this device? Can you re-read and confirm? - Are the arcs we defined going to work here as well? The current implementation in hisi_acc_vf_set_device_state() is very far away from what the v1 protocol is, so I'm having a hard time guessing, but.. RESUMING -> STOP Probably vf_qm_state_resume() RUNNING -> STOP vf_qm_fun_restart() - that is oddly named.. STOP -> RESUMING Seems to be a nop (likely a bug) STOP -> RUNNING Not implemented currenty? (also a bug) STOP -> STOP_COPY pf_qm_state_pre_save / vf_qm_state_save STOP_COPY -> STOP NOP And the modification for the P2P/NO DMA is presumably just fun_restart too since stopping the device and stopping DMA are going to be the same thing here? The mlx5 implementation linked above is a full example you can cut and paste from for how to implement the state function and the how to do the data transfer. The f_ops read/write implementation for acc looks trivial as it only streams the fixed size and pre-allocated 'struct acc_vf_data' It looks like it would be a short path to implement our v2 proposal and remove a lot of driver code, as we saw in mlx5. Thanks, Jason