On Mon, 28 Feb 2022 16:29:19 -0400 Jason Gunthorpe <jgg@xxxxxxxxxx> wrote: > On Mon, Feb 28, 2022 at 01:16:14PM -0700, Alex Williamson wrote: > > On Mon, 28 Feb 2022 14:05:20 -0400 > > Jason Gunthorpe <jgg@xxxxxxxxxx> wrote: > > > > > On Mon, Feb 28, 2022 at 06:01:44PM +0000, Shameerali Kolothum Thodi wrote: > > > > > > > +static long hisi_acc_vf_save_unl_ioctl(struct file *filp, > > > > + unsigned int cmd, unsigned long arg) > > > > +{ > > > > + struct hisi_acc_vf_migration_file *migf = filp->private_data; > > > > + loff_t *pos = &filp->f_pos; > > > > + struct vfio_device_mig_precopy precopy; > > > > + unsigned long minsz; > > > > + > > > > + if (cmd != VFIO_DEVICE_MIG_PRECOPY) > > > > + return -EINVAL; > > > > > > ENOTTY > > > > > > > + > > > > + minsz = offsetofend(struct vfio_device_mig_precopy, dirty_bytes); > > > > + > > > > + if (copy_from_user(&precopy, (void __user *)arg, minsz)) > > > > + return -EFAULT; > > > > + if (precopy.argsz < minsz) > > > > + return -EINVAL; > > > > + > > > > + mutex_lock(&migf->lock); > > > > + if (*pos > migf->total_length) { > > > > + mutex_unlock(&migf->lock); > > > > + return -EINVAL; > > > > + } > > > > + > > > > + precopy.dirty_bytes = 0; > > > > + precopy.initial_bytes = migf->total_length - *pos; > > > > + mutex_unlock(&migf->lock); > > > > + return copy_to_user((void __user *)arg, &precopy, minsz) ? -EFAULT : 0; > > > > +} > > > > > > Yes > > > > > > And I noticed this didn't include the ENOMSG handling, read() should > > > return ENOMSG when it reaches EOS for the pre-copy: > > > > > > + * During pre-copy the migration data FD has a temporary "end of stream" that is > > > + * reached when both initial_bytes and dirty_byte are zero. For instance, this > > > + * may indicate that the device is idle and not currently dirtying any internal > > > + * state. When read() is done on this temporary end of stream the kernel driver > > > + * should return ENOMSG from read(). Userspace can wait for more data (which may > > > + * never come) by using poll. > > > > I'm confused by your previous reply that the use of curr_state should > > be eliminated, isn't this ioctl only valid while the device is in the > > PRE_COPY or PRE_COPY_P2P states? Otherwise the STOP_COPY state would > > have some expectation to be able to use this ioctl for devices > > supporting PRE_COPY. > > I think it is fine to keep working on stop copy, though the > implementation here isn't quite right for that.. > > if (migf->total_length > QM_MATCH_SIZE) > precopy.dirty_bytes = migf->total_length - QM_MATCH_SIZE - *pos; > else > precopy.dity_bytes = 0; > > if (*pos < QM_MATCH_SIZE) > precopy.initial_bytes = QM_MATCH_SIZE - *pos; > else > precopy.initial_Bytes = 0; > > Unless you think we should block it. What's the meaning of initial_bytes and dirty_bytes while in STOP_COPY? It seems like these become meaningless and if so, why shouldn't the ioctl simply return -EINVAL if the device state doesn't match the window where it's useful? > > I'd like to see the uapi clarify exactly what states allow this > > ioctl and define the behavior of the ioctl when transitioning out of > > those states with an open data_fd, ie. is it defined to return an > > -errno once in STOP_COPY? Thanks, > > The ioctl is on the data_fd, so it should follow all the normal rules > of the data_fd just like read() - ie all ioctls/read/write fails when > teh state is moved outside one where the data_fd is valid. > > That looks like another issue with the above, it doesn't chck > migf->disabled. > > Should we add another sentence about this? Right, of course the ioctl goes away when the data_fd is invalid, the question is more that we've created this PRE_COPY_* specific ioctl and what does it mean to call it when not in a device state where the data_fd is still valid but this ioctl is really not. We should specify how the driver is intended to respond to this ioctl in STOP_COPY. Thanks, Alex