On Tue, Feb 01, 2022 at 10:04:08AM -0700, Alex Williamson wrote: > Ok, let me parrot back to see if I understand. -ENOTTY will be > returned if the ioctl doesn't exist, in which case device_state is > untouched and cannot be trusted. At the same time, we expect the user > to use the feature ioctl to make sure the ioctl exists, so it would > seem that we've reclaimed that errno if we believe the user should > follow the protocol. I don't follow - the documentation says what the code does, if you get ENOTTY returned then you don't get the device_state too. Saying the user shouldn't have called it in the first place is completely correct, but doesn't change the device_state output. > + if (!device->ops->migration_set_state) > + return -EOPNOTSUPP; > > Should return -ENOTTY, just as the feature does. As far as I know the kernel 'standard' is: - ENOTTY if the ioctl cmd # itself is not understood - E2BIG if the ioctl arg is longer than the kernel understands - EOPNOTSUPP if the ioctl arg contains data the kernel doesn't understand (eg flags the kernel doesn't know about), or the kernel understands the request but cannot support it for some reason. - EINVAL if the ioctl arg contains data the kernel knows about but rejects (ie invalid combinations of flags) VFIO kind of has its own thing, but I'm not entirely sure what the rules are, eg you asked for EOPNOTSUPP in the other patch, and here we are asking for ENOTTY? But sure, lets make it ENOTTY. > But it's also for future unsupported ops, but couldn't we also > specify that the driver must fill final_state with the current > device state for any such case. We also have this: > > + if (set_state.argsz < minsz || set_state.flags) > + return -EOPNOTSUPP; > > Which I think should be -EINVAL. That would match the majority of other VFIO tests. > That leaves -EFAULT, for example: > > + if (copy_from_user(&set_state, arg, minsz)) > + return -EFAULT; > > Should we be able to know the current device state in core code such > that we can fill in device state here? There is no point in doing a copy_to_user() to the same memory if a copy_from_user() failed, so device_state will still not be returned. We don't know the device_state in the core code because it can only be read under locking that is controlled by the driver. I hope when we get another driver merged that we can hoist the locking, but right now I'm not really sure - it is a complicated lock. > I think those changes would go a ways towards fully specified behavior > instead of these wishy washy unreliable return values. Then we could Huh? It is fully specified already. These changes just removed EOPNOTSUPP from the list where device_state isn't filled in. It is OK, but it is not really different... > "If this function fails and returns -1 then..." > > Could we clarify that to s/function/ioctl/? It caused me a moment of > confusion for the returned -errnos. Sure. > > > Should we be bumping a reference on the device FD such that we can't > > > have outstanding migration FDs with the device closed (and > > > re-assigned to a new user)? > > > > The driver must ensure any activity triggered by the migration FD > > against the vfio_device is halted before close_device() returns, just > > like basically everything else connected to open/close_device(). mlx5 > > does this by using the same EOF sanitizing the FSM logic uses. > > > > Once sanitized the f_ops should not be touching the vfio_device, or > > even have a pointer to it, so there is no reason to connect the two > > FDs together. I'd say it is a red flag if a driver proposes to do > > this, likely it means it has a problem with the open/close_device() > > lifetime model. > > Maybe we just need a paragraph somewhere to describe the driver > responsibilities and expectations in managing the migration FD, > including disconnecting it after end of stream and access relative to > the open state of the vfio_device. Seems an expanded descriptions > somewhere near the declaration in vfio_device_ops would be appropriate. Yes that is probably better than in the uapi header. > > I'm not sure what the overall VFIO vision is here.. Are we abandoning > > traditional ioctls in favour of a multiplexer? Calling the multiplexer > > ioctl "feature" is a bit odd.. > > Is it really? VF Token support is a feature that a device might have > and we can use the same interface to probe that it exists as well as > set the UUID token. We're using it to manipulate the state of a device > feature. > > If we're only looking for a means to expose that a device has support > for something, our options are a flag bit on the vfio_device_info or a > capability on that ioctl. It's arguable that the latter might be a > better option for VFIO_DEVICE_FEATURE_MIGRATION since its purpose is > only to return a flags field, ie. we're not interacting with a feature, > we're exposing a capability with fixed properties. I looked at this, and decided against it on practical reasons. I've organized this so the core code can do more work for the driver, which means the core code supplies the support info back to userspace. VFIO_DEVICE_INFO is currently open coded in every single driver and lifting that to get the same support looks like a huge pain. Even if we try to work it backwards somehow, we'd need to re-organize vfio-pci so other drivers can contribute to the cap chain - which is another ugly looking thing. On top of that, qemu becomes much less straightforward as we have to piggy back on the existing vfio code instead of just doing a simple ioctl to get back the small support info back. There is even an unpleasing mandatory user/kernel memory allocation and double ioctl in the caps path. The feature approach is much better, it has a much cleaner implementation in user/kernel. I think we should focus on it going forward and freeze caps. > > It complicates the user code a bit, it is more complicated to invoke the > > VFIO_DEVICE_FEATURE (check the qemu patch to see the difference). > > Is it really any more than some wrapper code? Are there objections to > this sort of multiplexer? There isn't too much reason to do this kind of stuff. Each subsystem gets something like 4 million ioctl numbers within its type, we will never run out of unique ioctls. Normal ioctls have a nice simplicity to them, adding layers creates complexity, feature is defiantly more complex to implement, and cap is a whole other level of more complex. None of this is necessary. I don't know what "cluttering" means here, I'd prefer we focus on things that give clean code and simple implementations than arbitary aesthetics. > > Either way I don't have a strong opinion, please have a think and let > > us know which you'd like to follow. > > I'm leaning towards a capability for migration support flags and a > feature for setting the state, but let me know if this looks like a bad > idea for some reason. Thanks, I don't want to touch capabilities, but we can try to use feature for set state. Please confirm this is what you want. You'll want the same for the PRE_COPY related information too? If we are into these very minor nitpicks does this mean you are OK with all the big topics now? Jason