> -----Original Message----- > From: Max Gurtovoy [mailto:mgurtovoy@xxxxxxxxxx] > Sent: 27 September 2021 19:24 > To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@xxxxxxxxxx>; > Leon Romanovsky <leon@xxxxxxxxxx> > Cc: Doug Ledford <dledford@xxxxxxxxxx>; Jason Gunthorpe > <jgg@xxxxxxxxxx>; Yishai Hadas <yishaih@xxxxxxxxxx>; Alex Williamson > <alex.williamson@xxxxxxxxxx>; Bjorn Helgaas <bhelgaas@xxxxxxxxxx>; David > S. Miller <davem@xxxxxxxxxxxxx>; Jakub Kicinski <kuba@xxxxxxxxxx>; Kirti > Wankhede <kwankhede@xxxxxxxxxx>; kvm@xxxxxxxxxxxxxxx; > linux-kernel@xxxxxxxxxxxxxxx; linux-pci@xxxxxxxxxxxxxxx; > linux-rdma@xxxxxxxxxxxxxxx; netdev@xxxxxxxxxxxxxxx; Saeed Mahameed > <saeedm@xxxxxxxxxx>; liulongfang <liulongfang@xxxxxxxxxx> > Subject: Re: [PATCH mlx5-next 2/7] vfio: Add an API to check migration state > transition validity > > > On 9/26/2021 7:17 PM, Shameerali Kolothum Thodi wrote: > > > >> -----Original Message----- > >> From: Max Gurtovoy [mailto:mgurtovoy@xxxxxxxxxx] > >> Sent: 26 September 2021 10:10 > >> To: Shameerali Kolothum Thodi > <shameerali.kolothum.thodi@xxxxxxxxxx>; > >> Leon Romanovsky <leon@xxxxxxxxxx> > >> Cc: Doug Ledford <dledford@xxxxxxxxxx>; Jason Gunthorpe > >> <jgg@xxxxxxxxxx>; Yishai Hadas <yishaih@xxxxxxxxxx>; Alex Williamson > >> <alex.williamson@xxxxxxxxxx>; Bjorn Helgaas <bhelgaas@xxxxxxxxxx>; > >> David S. Miller <davem@xxxxxxxxxxxxx>; Jakub Kicinski > >> <kuba@xxxxxxxxxx>; Kirti Wankhede <kwankhede@xxxxxxxxxx>; > >> kvm@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; > >> linux-pci@xxxxxxxxxxxxxxx; linux-rdma@xxxxxxxxxxxxxxx; > >> netdev@xxxxxxxxxxxxxxx; Saeed Mahameed <saeedm@xxxxxxxxxx>; > >> liulongfang <liulongfang@xxxxxxxxxx> > >> Subject: Re: [PATCH mlx5-next 2/7] vfio: Add an API to check > >> migration state transition validity > >> > >> > >> On 9/24/2021 10:44 AM, Shameerali Kolothum Thodi wrote: > >>>> -----Original Message----- > >>>> From: Max Gurtovoy [mailto:mgurtovoy@xxxxxxxxxx] > >>>> Sent: 23 September 2021 14:56 > >>>> To: Leon Romanovsky <leon@xxxxxxxxxx>; Shameerali Kolothum Thodi > >>>> <shameerali.kolothum.thodi@xxxxxxxxxx> > >>>> Cc: Doug Ledford <dledford@xxxxxxxxxx>; Jason Gunthorpe > >>>> <jgg@xxxxxxxxxx>; Yishai Hadas <yishaih@xxxxxxxxxx>; Alex > >>>> Williamson <alex.williamson@xxxxxxxxxx>; Bjorn Helgaas > >>>> <bhelgaas@xxxxxxxxxx>; David S. Miller <davem@xxxxxxxxxxxxx>; > Jakub > >>>> Kicinski <kuba@xxxxxxxxxx>; Kirti Wankhede <kwankhede@xxxxxxxxxx>; > >>>> kvm@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; > >>>> linux-pci@xxxxxxxxxxxxxxx; linux-rdma@xxxxxxxxxxxxxxx; > >>>> netdev@xxxxxxxxxxxxxxx; Saeed Mahameed <saeedm@xxxxxxxxxx> > >>>> Subject: Re: [PATCH mlx5-next 2/7] vfio: Add an API to check > >>>> migration state transition validity > >>>> > >>>> > >>>> On 9/23/2021 2:17 PM, Leon Romanovsky wrote: > >>>>> On Thu, Sep 23, 2021 at 10:33:10AM +0000, Shameerali Kolothum > >>>>> Thodi > >>>> wrote: > >>>>>>> -----Original Message----- > >>>>>>> From: Leon Romanovsky [mailto:leon@xxxxxxxxxx] > >>>>>>> Sent: 22 September 2021 11:39 > >>>>>>> To: Doug Ledford <dledford@xxxxxxxxxx>; Jason Gunthorpe > >>>> <jgg@xxxxxxxxxx> > >>>>>>> Cc: Yishai Hadas <yishaih@xxxxxxxxxx>; Alex Williamson > >>>>>>> <alex.williamson@xxxxxxxxxx>; Bjorn Helgaas > >>>>>>> <bhelgaas@xxxxxxxxxx>; > >>>> David > >>>>>>> S. Miller <davem@xxxxxxxxxxxxx>; Jakub Kicinski > >>>>>>> <kuba@xxxxxxxxxx>; Kirti Wankhede <kwankhede@xxxxxxxxxx>; > >>>>>>> kvm@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; > >>>>>>> linux-pci@xxxxxxxxxxxxxxx; linux-rdma@xxxxxxxxxxxxxxx; > >>>>>>> netdev@xxxxxxxxxxxxxxx; Saeed Mahameed <saeedm@xxxxxxxxxx> > >>>>>>> Subject: [PATCH mlx5-next 2/7] vfio: Add an API to check > >>>>>>> migration state transition validity > >>>>>>> > >>>>>>> From: Yishai Hadas <yishaih@xxxxxxxxxx> > >>>>>>> > >>>>>>> Add an API in the core layer to check migration state transition > >>>>>>> validity as part of a migration flow. > >>>>>>> > >>>>>>> The valid transitions follow the expected usage as described in > >>>>>>> uapi/vfio.h and triggered by QEMU. > >>>>>>> > >>>>>>> This ensures that all migration implementations follow a > >>>>>>> consistent migration state machine. > >>>>>>> > >>>>>>> Signed-off-by: Yishai Hadas <yishaih@xxxxxxxxxx> > >>>>>>> Reviewed-by: Kirti Wankhede <kwankhede@xxxxxxxxxx> > >>>>>>> Signed-off-by: Jason Gunthorpe <jgg@xxxxxxxxxx> > >>>>>>> Signed-off-by: Leon Romanovsky <leonro@xxxxxxxxxx> > >>>>>>> --- > >>>>>>> drivers/vfio/vfio.c | 41 > >>>> +++++++++++++++++++++++++++++++++++++++++ > >>>>>>> include/linux/vfio.h | 1 + > >>>>>>> 2 files changed, 42 insertions(+) > >>>>>>> > >>>>>>> diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c index > >>>>>>> 3c034fe14ccb..c3ca33e513c8 100644 > >>>>>>> --- a/drivers/vfio/vfio.c > >>>>>>> +++ b/drivers/vfio/vfio.c > >>>>>>> @@ -1664,6 +1664,47 @@ static int > >>>>>>> vfio_device_fops_release(struct > >>>> inode > >>>>>>> *inode, struct file *filep) > >>>>>>> return 0; > >>>>>>> } > >>>>>>> > >>>>>>> +/** > >>>>>>> + * vfio_change_migration_state_allowed - Checks whether a > >>>>>>> +migration > >>>> state > >>>>>>> + * transition is valid. > >>>>>>> + * @new_state: The new state to move to. > >>>>>>> + * @old_state: The old state. > >>>>>>> + * Return: true if the transition is valid. > >>>>>>> + */ > >>>>>>> +bool vfio_change_migration_state_allowed(u32 new_state, u32 > >>>> old_state) > >>>>>>> +{ > >>>>>>> + enum { MAX_STATE = VFIO_DEVICE_STATE_RESUMING }; > >>>>>>> + static const u8 vfio_from_state_table[MAX_STATE + > 1][MAX_STATE > >>>>>>> ++ > >>>> 1] = { > >>>>>>> + [VFIO_DEVICE_STATE_STOP] = { > >>>>>>> + [VFIO_DEVICE_STATE_RUNNING] = 1, > >>>>>>> + [VFIO_DEVICE_STATE_RESUMING] = 1, > >>>>>>> + }, > >>>>>>> + [VFIO_DEVICE_STATE_RUNNING] = { > >>>>>>> + [VFIO_DEVICE_STATE_STOP] = 1, > >>>>>>> + [VFIO_DEVICE_STATE_SAVING] = 1, > >>>>>>> + [VFIO_DEVICE_STATE_SAVING | > >>>> VFIO_DEVICE_STATE_RUNNING] > >>>>>>> = 1, > >>>>>> Do we need to allow _RESUMING state here or not? As per the > >>>>>> "State > >>>> transitions" > >>>>>> section from uapi/linux/vfio.h, > >>>>> It looks like we missed this state transition. > >>>>> > >>>>> Thanks > >>>> I'm not sure this state transition is valid. > >>>> > >>>> Kirti, When we would like to move from RUNNING to RESUMING ? > >>> I guess it depends on what you report as your dev default state. > >>> > >>> For HiSilicon ACC migration driver, we set the default to _RUNNING. > >> Where do you set it and report it ? > > Currently, in _open_device() we set the device_state to _RUNNING. > > Why do you do it ? It is by the assumption that the default state to be _RUNNING and then we take it from there for migration state changes. Any particular reason why it needs to be in _STOP state? We need to update the documentation in that case to allow _STOP --> _RESUMING. > > > > > I think in your case the default of vmig->vfio_dev_state == 0 (_STOP). > > > >>> And when the migration starts, the destination side Qemu, set the > >>> device state to _RESUMING(vfio_load_state()). > >>> > >>> From the documentation, it looks like the assumption on default > >>> state of the VFIO dev is _RUNNING. > >>> > >>> " > >>> * 001b => Device running, which is the default state " > >>> > >>>> Sameerali, can you please re-test and update if you see this transition ? > >>> Yes. And if I change the default state to _STOP, then the transition > >>> is from _STOP --> _RESUMING. > >>> > >>> But the documentation on State transitions doesn't have _STOP --> > >>> _RESUMING transition as valid. > >>> > >>> Thanks, > >>> Shameer > >>> > >>>>>> " * 4. To start the resuming phase, the device state should be > >>>>>> transitioned > >>>> from > >>>>>> * the _RUNNING to the _RESUMING state." > >>>>>> > >>>>>> IIRC, I have seen that transition happening on the destination > >>>>>> dev while > >>>> testing the > >>>>>> HiSilicon ACC dev migration. > >>>>>> > >>>>>> Thanks, > >>>>>> Shameer > >>>>>> > >>>>>>> + }, > >>>>>>> + [VFIO_DEVICE_STATE_SAVING] = { > >>>>>>> + [VFIO_DEVICE_STATE_STOP] = 1, > >>>>>>> + [VFIO_DEVICE_STATE_RUNNING] = 1, > >>>>>>> + }, > >>>>>>> + [VFIO_DEVICE_STATE_SAVING | > VFIO_DEVICE_STATE_RUNNING] > >>>> = { > >>>>>>> + [VFIO_DEVICE_STATE_RUNNING] = 1, > >>>>>>> + [VFIO_DEVICE_STATE_SAVING] = 1, > >>>>>>> + }, > >>>>>>> + [VFIO_DEVICE_STATE_RESUMING] = { > >>>>>>> + [VFIO_DEVICE_STATE_RUNNING] = 1, > >>>>>>> + [VFIO_DEVICE_STATE_STOP] = 1, > >>>>>>> + }, > >>>>>>> + }; > >>>>>>> + > >>>>>>> + if (new_state > MAX_STATE || old_state > MAX_STATE) > >>>>>>> + return false; > >>>>>>> + > >>>>>>> + return vfio_from_state_table[old_state][new_state]; > >>>>>>> +} > >>>>>>> +EXPORT_SYMBOL_GPL(vfio_change_migration_state_allowed); > >>>>>>> + > >>>>>>> static long vfio_device_fops_unl_ioctl(struct file *filep, > >>>>>>> unsigned int cmd, unsigned long arg) > >>>>>>> { > >>>>>>> diff --git a/include/linux/vfio.h b/include/linux/vfio.h index > >>>>>>> b53a9557884a..e65137a708f1 100644 > >>>>>>> --- a/include/linux/vfio.h > >>>>>>> +++ b/include/linux/vfio.h > >>>>>>> @@ -83,6 +83,7 @@ extern struct vfio_device > >>>>>>> *vfio_device_get_from_dev(struct device *dev); > >>>>>>> extern void vfio_device_put(struct vfio_device *device); > >>>>>>> > >>>>>>> int vfio_assign_device_set(struct vfio_device *device, void > >>>>>>> *set_id); > >>>>>>> +bool vfio_change_migration_state_allowed(u32 new_state, u32 > >>>> old_state); > >>>>>>> /* events for the backend driver notify callback */ > >>>>>>> enum vfio_iommu_notify_type { > >>>>>>> -- > >>>>>>> 2.31.1