> -----Original Message----- > From: Jason Gunthorpe [mailto:jgg@xxxxxxxxxx] > Sent: 02 March 2022 00:03 > To: Alex Williamson <alex.williamson@xxxxxxxxxx> > Cc: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@xxxxxxxxxx>; > kvm@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; > linux-crypto@xxxxxxxxxxxxxxx; cohuck@xxxxxxxxxx; mgurtovoy@xxxxxxxxxx; > yishaih@xxxxxxxxxx; Linuxarm <linuxarm@xxxxxxxxxx>; liulongfang > <liulongfang@xxxxxxxxxx>; Zengtao (B) <prime.zeng@xxxxxxxxxxxxx>; > Jonathan Cameron <jonathan.cameron@xxxxxxxxxx>; Wangzhou (B) > <wangzhou1@xxxxxxxxxxxxx> > Subject: Re: [PATCH v6 09/10] hisi_acc_vfio_pci: Add support for VFIO live > migration > > On Tue, Mar 01, 2022 at 03:44:31PM -0700, Alex Williamson wrote: > > On Tue, 1 Mar 2022 16:39:38 -0400 > > Jason Gunthorpe <jgg@xxxxxxxxxx> wrote: > > > > > On Tue, Mar 01, 2022 at 12:30:47PM -0700, Alex Williamson wrote: > > > > Wouldn't it make more sense if initial-bytes started at QM_MATCH_SIZE > > > > and dirty-bytes was always sizeof(vf_data) - QM_MATCH_SIZE? ie. > QEMU > > > > would know that it has sizeof(vf_data) - QM_MATCH_SIZE remaining even > > > > while it's getting ENOMSG after reading QM_MATCH_SIZE bytes of data. > > > > > > The purpose of this ioctl is to help userspace guess when moving on to > > > STOP_COPY is a good idea ie when the device has done almost all the > > > work it is going to be able to do in PRE_COPY. ENOMSG is a similar > > > indicator. > > > > > > I expect all devices to have some additional STOP_COPY trailer_data in > > > addition to their PRE_COPY initial_data and dirty_data > > > > > > There is a choice to make if we report the trailer_data during > > > PRE_COPY or not. As this is all estimates, it doesn't matter unless > > > the trailer_data is very big. > > > > > > Having all devices trend toward a 0 dirty_bytes to say they are are > > > done all the pre-copy they can do makes sense from an API > > > perspective. If one device trends toward 10MB due to a big > > > trailer_data and one trends toward 0 bytes, how will qemu consistently > > > decide when best to trigger STOP_COPY? It makes the API less useful. > > > > > > So, I would not include trailer_data in the dirty_bytes. > > > > That assumes that it's possible to keep up with the device dirty > > rate. > > It keeps options open so we have this choice someday. > > We already see that implementations are using vCPU throttling as part > of their migration strategy, and we are seriously looking at DMA > throttling. It is not a big leap to imagine that > internal-state-dirtying throttling will happne someday. > > With throttling iterations would ratchet up the throttle until they > reach an absolute small amount of dirty then cut over to STOP_COPY > > > It seems like a better approach for userspace would be to look at how > > dirty_bytes is trending. > > It may be biw, but this approach doesn't care if the trailing_bytes > are included or not, so lets leave them out and preserve the other > operating model. > > > If we exclude STOP_COPY trailing data from the VFIO_DEVICE_MIG_PRECOPY > > ioctl, it seems even more of a disconnect that when we enter the > > STOP_COPY state, suddenly we start getting new data out of a PRECOPY > > ioctl. > > Why? That amounts can go up at any time, how does it matter if it goes > up after STOP_COPY or instantly before? > > > BTW, "VFIO_DEVICE" should be reserved for ioctls and data structures > > relative to the device FD, appending it with _MIG is too subtle for me. > > This is also a GET operation for INFO, so I'd think for consistency > > with the existing vfio uAPI we'd name this something like > > VFIO_MIG_GET_PRECOPY_INFO where the structure might be named > > vfio_precopy_info. > > Sure > > > So if we don't think this is the right approach for STOP_COPY, then why > > are we pushing that it has any purpose outside of PRECOPY or might be > > implemented by a non-PRECOPY driver for use in STOP_COPY? > > It is just simpler and more consistent to implement the math under > this ioctl in all cases then to try and artificially restrict it. > > But I don't have a use case for it, so lets block it if you prefer. > > Shameerali will you make these adjustments to the PRE_COPY patch? Sure. I think we can summarize the discussion as below, - Rename the MIG_PRECOPY ioctl to VFIO_MIG_GET_PRECOPY_INFO and structure to vfio_precopy_info. - This ioctl is only valid in PRE_COPY state and should return -EINVAL in other states(Update the documentation). - No changes to the initial_bytes & dirty_bytes descriptions. Please let me know if I missed anything. I will address other comments on this series as well and sent out a revised one soon. Thanks, Shameer