Re: [RFC v2 0/4] vfio/hisilicon: add acc live migration driver

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Jul 02, 2021 at 10:58:45AM +0100, Shameer Kolothum wrote:
> This series attempts to add vfio live migration support for
> HiSilicon ACC VF devices. HiSilicon ACC VF device MMIO space
> includes both the functional register space and migration 
> control register space. As discussed in RFCv1[0], this may create
> security issues as these regions get shared between the Guest
> driver and the migration driver. Based on the feedback, we tried
> to address those concerns in this version. 
> 
> This is now based on the new vfio-pci-core framework proposal[1].
> Understand that the framework proposal is still under discussion,
> but really appreciate any feedback on the approach taken here
> to mitigate the security risks.

Hi, can you look at the v6 proposal for the mlx5 implementation of the
migration API and see if it meets hisilicon acc's needs as well?

https://lore.kernel.org/all/20220130160826.32449-1-yishaih@xxxxxxxxxx/

There are few topics to consider:
 - Which of the three feature sets (STOP_COPY, P2P and PRECOPY) make
   sense for this driver?

   I see pf_qm_state_pre_save() but didn't understand why it wanted to
   send the first 32 bytes in the PRECOPY mode? It is fine, but it
   will add some complexity to continue to do this.

 - I think we discussed the P2P implementation and decided it would
   work for this device? Can you re-read and confirm?

 - Are the arcs we defined going to work here as well? The current
   implementation in hisi_acc_vf_set_device_state() is very far away
   from what the v1 protocol is, so I'm having a hard time guessing,
   but..

      RESUMING -> STOP
        Probably vf_qm_state_resume()

      RUNNING -> STOP
        vf_qm_fun_restart() - that is oddly named..

      STOP -> RESUMING
        Seems to be a nop (likely a bug)

      STOP -> RUNNING
         Not implemented currenty? (also a bug)

      STOP -> STOP_COPY
         pf_qm_state_pre_save / vf_qm_state_save

      STOP_COPY -> STOP
         NOP

   And the modification for the P2P/NO DMA is presumably just
   fun_restart too since stopping the device and stopping DMA are
   going to be the same thing here?

The mlx5 implementation linked above is a full example you can cut and
paste from for how to implement the state function and the how to do
the data transfer. The f_ops read/write implementation for acc looks
trivial as it only streams the fixed size and pre-allocated 'struct
acc_vf_data'

It looks like it would be a short path to implement our v2 proposal
and remove a lot of driver code, as we saw in mlx5.

Thanks,
Jason



[Index of Archives]     [Kernel]     [Gnu Classpath]     [Gnu Crypto]     [DM Crypt]     [Netfilter]     [Bugtraq]

  Powered by Linux