On Mon, 2 Dec 2024 at 09:38, Laurent Pinchart <laurent.pinchart@xxxxxxxxxxxxxxxx> wrote: > > On Mon, Dec 02, 2024 at 08:28:48AM +0100, Ricardo Ribalda wrote: > > On Mon, 2 Dec 2024 at 01:19, Laurent Pinchart > > <laurent.pinchart@xxxxxxxxxxxxxxxx> wrote: > > > > > > On Fri, Nov 29, 2024 at 11:18:54PM +0100, Ricardo Ribalda wrote: > > > > On Fri, 29 Nov 2024 at 23:03, Laurent Pinchart wrote: > > > > > On Fri, Nov 29, 2024 at 07:47:31PM +0100, Ricardo Ribalda wrote: > > > > > > Before we all go on a well deserved weekend, let me recap what we > > > > > > know. If I did not get something correctly, let me know. > > > > > > > > > > > > 1) Well behaved devices do not allow to set or get an incomplete async > > > > > > control. They will stall instead (ref: Figure 2-21 in UVC 1.5 ) > > > > > > 2) Both Laurent and Ricardo consider that there is a big chance that > > > > > > some camera modules do not implement this properly. (ref: years of > > > > > > crying over broken module firmware :) ) > > > > > > > > > > > > 3) ctrl->handle is designed to point to the fh that originated the > > > > > > control. So the logic can decide if the originator needs to be > > > > > > notified or not. (ref: uvc_ctrl_send_event() ) > > > > > > 4) Right now we replace the originator in ctrl->handle for unfinished > > > > > > async controls. (ref: > > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/media/usb/uvc/uvc_ctrl.c#n2050) > > > > > > > > > > > > My interpretation is that: > > > > > > A) We need to change 4). We shall not change the originator of > > > > > > unfinished ctrl->handle. > > > > > > B) Well behaved cameras do not need the patch "Do not set an async > > > > > > control owned by another fh" > > > > > > C) For badly behaved cameras, it is fine if we slightly break the > > > > > > v4l2-compliance in corner cases, if we do not break any internal data > > > > > > structure. > > > > > > > > > > The fact that some devices may not implement the documented behaviour > > > > > correctly may not be a problem. Well-behaved devices will stall, which > > > > > means we shouldn't query the device while as async update is in > > > > > progress. Badly-behaved devices, whatever they do when queried, should > > > > > not cause any issue if we don't query them. > > > > > > > > I thought we could detect the stall and return safely. Isn't that the case? > > > > > > We could, but if we know the device will stall anyway, is there a reason > > > not to avoid issuing the request in the first place ? > > > > Because there are always going to be devices that do not send the > > event when the control has finished. > > > > It is impossible to know the state of the device: Is the zoom still > > moving or the device is not compliant? > > Another type of broken behaviour would be devices that don't correctly > handle UVC_GET and UVC_CUR requests while an async update is in > progress. If you issue those requests, those devices will break. Are > they more or less important than devices that don't send an event ? > > All of those are partly theoretical problems. I'd rather keep it simple > for now until we get feedback about broken devices. > > > > > Why we have not seen issues with this? > > > > > > I haven't tested a PTZ device for a very long time, and you would need > > > to hit a small time window to see the issue. > > > > Not that small, some devices take up to 10 seconds to go from the > > smallest zoom to the biggest zoom. > > I'd say that v4l2-ctl has a very big chance to hit any error. > > > > BTW, homing can take an even longer time. > > > > > > > We should not send GET_CUR and SET_CUR requests to the device while an > > > > > async update is in progress, and use cached values instead. When we > > > > > receive the async update event, we should clear the cache. This will be > > > > > the same for both well-behaved and badly-behaved devices, so we can > > > > > expose the same behaviour towards userspace. > > > > > > > > seting ctrl->loaded = 0 when we get an event sounds like a good idea > > > > and something we can implement right away. > > > > If I have to resend the set I will add it to the end. > > > > > > > > > We possibly also need some kind of timeout mechanism to cope with the > > > > > async update event not being delivered by the device. > > > > > > > > This is the part that worries me the most: > > > > - timeouts make the code fragile > > > > - What is a good value for timeout? 1 second, 30, 300? I do not think > > > > that we can find a value. > > > > > > I've been thinking about the implementation of uvc_fh cleanup over the > > > weekend, and having a timeout would have the nice advantage that we > > > could reference-count uvc_fh instead of implementing a cleanup that > > > walks over all controls when closing a file handle. I think it would > > > make the code simpler, and possibly safer too. > > > > The problem with a timeout is: > > - We do not know what is the right value for the timeout. > > - We need to have one timeout per control, or implement a timeout > > dispatcher mechanism, and that is racy by nature > > - It will require introducing a new locking mechanism to avoid races > > between the timeout handler and the event completion > > Timeouts don't come for free, that's for sure. > > > - It introduces a new lifetime in the driver... I'd argue that it is > > best to have less, not more. > > That I disagree with, my experience with V4L2 (as well as with DRM, or > actually the kernel in general) is that we would be in a much better > place today if object lifetimes were handled with reference counting > more widely. > > > I do not see many benefits.... > > The main benefit is simplifying the close() implementation and > complexity, as well as lowering lock contention (whether the latter > brings a real advantage in practice, I haven't checked).. > > > What we can introduce on top of my set is something like this (just > > pseudocode, do not scream at me :P) That code can prevent stalls and > > will work with misbehaving hardware.... (But I still do not know a > > good value for timeout) and solve some of the issues that I mention. > > > > diff --git a/drivers/media/usb/uvc/uvc_ctrl.c b/drivers/media/usb/uvc/uvc_ctrl.c > > index f0e8a436a306..a55bd4b3ac07 100644 > > --- a/drivers/media/usb/uvc/uvc_ctrl.c > > +++ b/drivers/media/usb/uvc/uvc_ctrl.c > > @@ -1132,6 +1132,9 @@ static int __uvc_ctrl_get(struct uvc_video_chain *chain, > > if ((ctrl->info.flags & UVC_CTRL_FLAG_GET_CUR) == 0) > > return -EACCES; > > > > + if (ctrl->handle && ( NOW - ctrl->async_start_time ) < TIMEOUT) > > + return -EBUSY; > > + > > ret = __uvc_ctrl_load_cur(chain, ctrl); > > if (ret < 0) > > return ret; > > @@ -1591,6 +1594,7 @@ static int uvc_ctrl_get_handle(struct uvc_fh > > *handle, struct uvc_control *ctrl) > > return -EBUSY; > > > > ctrl->handle = handle; > > + ctrl->async_start_time = NOW; > > handle->pending_async_ctrls++; > > return 0; > > } > > @@ -1982,6 +1986,9 @@ int uvc_ctrl_set(struct uvc_fh *handle, > > if (!(ctrl->info.flags & UVC_CTRL_FLAG_SET_CUR)) > > return -EACCES; > > > > + if (ctrl->handle && ( NOW - ctrl->async_start_time ) < TIMEOUT) > > + return -EBUSY; > > If I understand you correctly, this is meant to prevent accessing the > device for an initial TIMEOUT period, based on the knowledge it will > very likely STALL. After the TIMEOUT, if the async set is still not > complete, SET_CUR will be issued, and a STALL will likely occur. > > If we device to standardize on -EBUSY while the async set is in > progress, we need to do so before and after the timeout. > uvc_query_ctrl() will return -EBUSY in case of a STALL if the device > reports the "Not ready" error. This is what I expect a device to report > in this case. Of course the UVC specification, in section 2.4.4, > documents that the device will update the request error code control, > but doesn't tell which value should be used :-S I suppose we'll handle > broken devices if the need arise. > > > + > > /* Clamp out of range values. */ > > switch (mapping->v4l2_type) { > > case V4L2_CTRL_TYPE_INTEGER: > > diff --git a/drivers/media/usb/uvc/uvcvideo.h b/drivers/media/usb/uvc/uvcvideo.h > > index e0e4f099a210..5c82fae94dff 100644 > > --- a/drivers/media/usb/uvc/uvcvideo.h > > +++ b/drivers/media/usb/uvc/uvcvideo.h > > @@ -154,6 +154,8 @@ struct uvc_control { > > * File handle that initially changed the > > * async control. > > */ > > + > > + TIME async_start_time; > > }; > > > > > > In any case. Can we solve this in incremental steps? I think the > > current patch seems simple and correct. We can introduce the timeout > > later. > > The main reason for a timeout was to replace walking over all controls > at close() time with reference counting of uvc_fh, which is an > alternative approach to the current patch. Given that picking a good > timeout value is difficult, and that the current implementation already > skips the walk in the most common case when no async set is pending, I > suppose we could start with that. Cool. I am going to send a new version of: https://patchwork.linuxtv.org/project/linux-media/patch/20241129-uvc-fix-async-v4-1-f23784dba80f@xxxxxxxxxxxx/ using uvc_ctrl_get_handle / uvc_ctrl_put_handle ad lockdep annotations. Hopefully in the future we can get rid of: uvc_ctrl_cleanup_fh() Thanks! > > > > > > Regarding the userspace behaviour during an auto-update, we have > > > > > multiple options: > > > > > > > > > > For control get, > > > > > > > > > > - We can return -EBUSY > > > > > - We can return the old value from the cache > > > > > - We can return the new value fromt he cache > > > > > > > > > > Returning -EBUSY would be simpler to implement. > > > > > > > > Not only easy, I think it is the most correct, > > > > > > > > > I don't think the behaviour should depend on whether the control is read > > > > > on the file handle that initiated the async operation or on a different > > > > > file handle. > > > > > > > > > > For control set, I don't think we can do much else than returning > > > > > -EBUSY, regardless of which file handle the control is set on. > > > > > > > > ACK. > > > > > > > > > > I will send a new version with my interpretation. > > > > > > > > > > > > Thanks for a great discussion > > > > > > > > Looking with some perspective... I believe that we should look into > > > > the "userspace behaviour for auto controls" in a different patchset. > > > > It is slightly unrelated to this discussion. > > -- > Regards, > > Laurent Pinchart -- Ricardo Ribalda