On Mon, 2 Dec 2024 at 13:19, Hans de Goede <hdegoede@xxxxxxxxxx> wrote: > > Hi, > > On 2-Dec-24 11:50 AM, Ricardo Ribalda wrote: > > On Mon, 2 Dec 2024 at 11:27, Hans de Goede <hdegoede@xxxxxxxxxx> wrote: > >> > >> Hi, > >> > >> On 2-Dec-24 9:44 AM, Hans Verkuil wrote: > >>> On 02/12/2024 09:11, Laurent Pinchart wrote: > >>>> On Mon, Dec 02, 2024 at 09:05:07AM +0100, Hans Verkuil wrote: > >>>>> On 02/12/2024 01:18, Laurent Pinchart wrote: > >>>>>> On Fri, Nov 29, 2024 at 11:18:54PM +0100, Ricardo Ribalda wrote: > >>>>>>> On Fri, 29 Nov 2024 at 23:03, Laurent Pinchart wrote: > >>>>>>>> On Fri, Nov 29, 2024 at 07:47:31PM +0100, Ricardo Ribalda wrote: > >>>>>>>>> Before we all go on a well deserved weekend, let me recap what we > >>>>>>>>> know. If I did not get something correctly, let me know. > >>>>>>>>> > >>>>>>>>> 1) Well behaved devices do not allow to set or get an incomplete async > >>>>>>>>> control. They will stall instead (ref: Figure 2-21 in UVC 1.5 ) > >>>>>>>>> 2) Both Laurent and Ricardo consider that there is a big chance that > >>>>>>>>> some camera modules do not implement this properly. (ref: years of > >>>>>>>>> crying over broken module firmware :) ) > >>>>>>>>> > >>>>>>>>> 3) ctrl->handle is designed to point to the fh that originated the > >>>>>>>>> control. So the logic can decide if the originator needs to be > >>>>>>>>> notified or not. (ref: uvc_ctrl_send_event() ) > >>>>>>>>> 4) Right now we replace the originator in ctrl->handle for unfinished > >>>>>>>>> async controls. (ref: > >>>>>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/media/usb/uvc/uvc_ctrl.c#n2050) > >>>>>>>>> > >>>>>>>>> My interpretation is that: > >>>>>>>>> A) We need to change 4). We shall not change the originator of > >>>>>>>>> unfinished ctrl->handle. > >>>>>>>>> B) Well behaved cameras do not need the patch "Do not set an async > >>>>>>>>> control owned by another fh" > >>>>>>>>> C) For badly behaved cameras, it is fine if we slightly break the > >>>>>>>>> v4l2-compliance in corner cases, if we do not break any internal data > >>>>>>>>> structure. > >>>>>>>> > >>>>>>>> The fact that some devices may not implement the documented behaviour > >>>>>>>> correctly may not be a problem. Well-behaved devices will stall, which > >>>>>>>> means we shouldn't query the device while as async update is in > >>>>>>>> progress. Badly-behaved devices, whatever they do when queried, should > >>>>>>>> not cause any issue if we don't query them. > >>>>>>> > >>>>>>> I thought we could detect the stall and return safely. Isn't that the case? > >>>>>> > >>>>>> We could, but if we know the device will stall anyway, is there a reason > >>>>>> not to avoid issuing the request in the first place ? > >>>>>> > >>>>>>> Why we have not seen issues with this? > >>>>>> > >>>>>> I haven't tested a PTZ device for a very long time, and you would need > >>>>>> to hit a small time window to see the issue. > >>>>>> > >>>>>>>> We should not send GET_CUR and SET_CUR requests to the device while an > >>>>>>>> async update is in progress, and use cached values instead. When we > >>>>>>>> receive the async update event, we should clear the cache. This will be > >>>>>>>> the same for both well-behaved and badly-behaved devices, so we can > >>>>>>>> expose the same behaviour towards userspace. > >>>>>>> > >>>>>>> seting ctrl->loaded = 0 when we get an event sounds like a good idea > >>>>>>> and something we can implement right away. > >>>>>>> If I have to resend the set I will add it to the end. > >>>>>>> > >>>>>>>> We possibly also need some kind of timeout mechanism to cope with the > >>>>>>>> async update event not being delivered by the device. > >>>>>>> > >>>>>>> This is the part that worries me the most: > >>>>>>> - timeouts make the code fragile > >>>>>>> - What is a good value for timeout? 1 second, 30, 300? I do not think > >>>>>>> that we can find a value. > >>>>>> > >>>>>> I've been thinking about the implementation of uvc_fh cleanup over the > >>>>>> weekend, and having a timeout would have the nice advantage that we > >>>>>> could reference-count uvc_fh instead of implementing a cleanup that > >>>>>> walks over all controls when closing a file handle. I think it would > >>>>>> make the code simpler, and possibly safer too. > >>>>>> > >>>>>>>> Regarding the userspace behaviour during an auto-update, we have > >>>>>>>> multiple options: > >>>>>>>> > >>>>>>>> For control get, > >>>>>>>> > >>>>>>>> - We can return -EBUSY > >>>>>>>> - We can return the old value from the cache > >>>>> > >>>>> This would match the control behavior best. Only when the operation is > >>>>> done is the control updated and the control event sent. > >>>>> > >>>>> Some questions: is any of this documented for UVC? Because this is non-standard > >>>> > >>>> No this isn't documented. > >>>> > >>>>> behavior. Are there applications that rely on this? Should we perhaps add > >>>> > >>>> I don't know. > >>>> > >>>>> proper support for this to the control framework? E.g. add an ASYNC flag and > >>>>> document this? > >>>> > >>>> We could, but this is such a specific use case that I don't think is > >>>> worth adding complexity to the already complex control framework would > >>>> be worth it. What we could do is perhaps adding a flag for the userspace > >>>> API, but even there, I never like modelling an API with a single user. > >>> > >>> Well, it might be a single driver that uses this, but it is also the most > >>> used driver by far. I think the only change is to add a flag for this and > >>> describe how it should behave. And add v4l2-compliance tests for it. > >>> > >>> Otherwise no changes to the control framework are needed, I think. > >>> > >>> Controls with the ASYNC flag set would: > >>> > >>> - return the old value from the cache. > >>> - document that setting a new value while the operation is in progress > >>> results in EBUSY. Document that if the new value is equal to the old value, > >>> then return 0 and do nothing (alternative is to just immediately send > >>> the control changed event, but that might require a control framework change). > >>> - when the operation finishes, update the cache to the new value and > >>> send the control changed event. > >>> - document that userspace should specify V4L2_EVENT_SUB_FL_ALLOW_FEEDBACK > >>> when subscribing to the control if you calling fh wants to know when > >>> the operation finishes. > >>> - document how timeouts should be handled: this is tricky, especially with > >>> bad hardware. I.e. if the hw doesn't send the event, does that mean that > >>> you are never able to set the control since it will stall? > >>> In the end this will just reflect how UVC handles this. > >> > >> I have been catching up on this thread (I have not read the v3 and v4 > >> threads yet). > >> > >> This all started with Ricardo noticing that ctrl->handle may get > >> overwritten when another app sets the ctrl, causing the first app > >> to set the ctrl to get a V4L2_EVENT for the ctrl (if subscribed) > >> even though it set the ctrl itself. > >> > >> My observations so far: > >> > >> 1. This is only hit when another app changes the ctrl after the first app, > >> in this case, if there is no stall issued by the hw for the second app's > >> request, arguably the first app getting the event for the ctrl is correct > > > > In other words, for non compliant cameras the current behaviour is > > correct. For compliant cameras it is broken. > > > >> since it was changed by the second app. IOW I think the current behavior > >> is not only fine, but even desirable. Assuming we only override ctrl->handle > >> after successfully sending the set-ctrl request to the hardware. > > > > We are overriding ctrl->handle unconditionally, even if set-ctrl stalls. > > Right I was just looking at that. Since we hold chain->ctrl_mutex > from the start of uvc_ioctl_s_try_ext_ctrls() until we finish > uvc_ctrl_commit() we can delay setting ctrl->handle until > a successful commit. > > Actually looking at this I think I've found another bug, uvc_ctrl_set() > always sets ctrl->handle also in the VIDIOC_TRY_EXT_CTRLS case in > which case we never send the SET_CUR command to the camera. ack > > And the handle is not part of the data backed up / restored by > a rollback. > > Moving the storing of the handle to a successful commit fixes > both the overriding of the handle in the stall case as well > as the bogus setting of the handle on VIDIOC_TRY_EXT_CTRLS. > > So my suggestion would be not touching ctrl->handle from > ctrl_set() instead store the handle in a new ctrl->new_handle > variable there and copy the new_handle values to handle for affected > controls after a successful commit (before releasing the lock). We do not need to do that. Instead, we can move the set to uvc_ctrl_commit_entity(). I have a patchset ready with this change. > > This will also successfully replace the handle for buggy > devices which do not report a stall but instead accept the > second SET_CUR. This replacement of handle is the correct thing > todo in this case since after the second SET_CUR is accepted > there is a change of the ctrl happening from the pov of the > issuer of the first SET_CUR. > > >> 2. This adds a lot of complexity for not sending an event to the app > >> which made the change. Hans V. suggested maybe adding some sort of flag > >> for async ctrls to the userspace API. I wonder if we should not just > >> get rid of this complexity and document that these controls will always > >> generate events independent of V4L2_EVENT_SUB_FL_ALLOW_FEEDBACK ? > >> That would certainly simplify things, but it raises the questions if > >> this will cause issues for existing applications. > > > > To be honest, I am more concerned about the dangling pointers than the event. > > Ack. > > > Updating the doc to say that ASYC controls always generate events > > sounds good to me. But until we reach an agreement on the specifics > > I'd rather land this fix and then we can take time to design an API > > that works for compliant and non compliant hardware. > > I agree that we should focus on fixing the dangling pointer problem > and your v4 series is heading in the right direction there. > > I'm not sure if we should take your v4 series as is though, see above. > > At a minimum I think the issue with setting ctrl->handle in > the VIDIOC_TRY_EXT_CTRLS case needs to be fixed. > > >> Note that if we simply return -EBUSY on set until acked by a status > >> event we also avoid the issue of ctrl->handle getting overwritten, > >> but that relies on reliable status events; or requires timeout handling. > >> > >> 3. I agree with Ricardo that a timeout based approach for cameras which > >> to not properly send status events for async ctrls is going to be > >> problematic. Things like pan/tilt homing can take multiple seconds which > >> is really long to use as a timeout if we plan to return -EBUSY until > >> the timeout triggers. I think it would be better to just rely on > >> the hardware sending a stall, or it accepting and correctly handling > >> a new CUR_SET command while the previous one is still being processed. > >> > >> I guess we can track if the hw does send status events when async ctrls > >> complete and then do the -EBUSY thing without going out to the hw after > >> the first time an async ctrl has been acked by a status event. > >> > >> And then combine that with the current behavior of overwriting ctrl->handle > >> until the ctrl has been marked as having working status events. So: > >> > >> a) In case we do not know yet if a ctrl gets status-event acks; and > >> on devices without reliable status events keep current behavior. > >> > >> b) As soon as we know a ctrl has reliable status events, switch to > >> returning -EBUSY if a set is pending (as indicated by ctrl->handle > >> being set). > >> > >> I don't like the fact that this changes the behavior after the first > >> status event acking an async ctrl, but I don't really see another way. > > > > If I understood you correctly, you are proposing the following quirk: > > > > diff --git a/drivers/media/usb/uvc/uvc_ctrl.c b/drivers/media/usb/uvc/uvc_ctrl.c > > index f0e8a436a306..1a554afeaa2f 100644 > > --- a/drivers/media/usb/uvc/uvc_ctrl.c > > +++ b/drivers/media/usb/uvc/uvc_ctrl.c > > @@ -1132,6 +1132,9 @@ static int __uvc_ctrl_get(struct uvc_video_chain *chain, > > if ((ctrl->info.flags & UVC_CTRL_FLAG_GET_CUR) == 0) > > return -EACCES; > > > > + if (ctrl->handle && ctrl->async_event_works) > > + return -EBUSY; > > + > > ret = __uvc_ctrl_load_cur(chain, ctrl); > > if (ret < 0) > > return ret; > > @@ -1672,6 +1675,8 @@ bool uvc_ctrl_status_event_async(struct urb> *urb, struct uvc_video_chain *chain, > > /* Flush the control cache, the data might have changed. */ > > ctrl->loaded = 0; > > > > + ctrl->async_event_works = true; > > + > > if (list_empty(&ctrl->info.mappings)) > > return false; > > > > @@ -1982,6 +1987,9 @@ int uvc_ctrl_set(struct uvc_fh *handle, > > if (!(ctrl->info.flags & UVC_CTRL_FLAG_SET_CUR)) > > return -EACCES; > > > > + if (ctrl->handle && ctrl->async_event_works) > > + return -EBUSY; > > + > > Yes this is what I'm proposing, except that this check should be > skipped in the VIDIOC_TRY_EXT_CTRLS case. I think we need to add > a "bool try" parameter to uvc_ctrl_set() and not look at / set > [new_]handle when this is set. SGTM. Let's add this improvement later. > > > > > /* Clamp out of range values. */ > > switch (mapping->v4l2_type) { > > case V4L2_CTRL_TYPE_INTEGER: > > diff --git a/drivers/media/usb/uvc/uvcvideo.h b/drivers/media/usb/uvc/uvcvideo.h > > index e0e4f099a210..0ef7c594eecb 100644 > > --- a/drivers/media/usb/uvc/uvcvideo.h > > +++ b/drivers/media/usb/uvc/uvcvideo.h > > @@ -154,6 +154,7 @@ struct uvc_control { > > * File handle that initially changed the > > * async control. > > */ > > + bool async_event_works; > > }; > > > > The benefit is that we can predict a device returning STALL without > > having to actually do the set/get operation. > > > > We can add it as a follow-up patch. > > Another benefit would be correctly returning -EBUSY when trying to get > the ctrl. I agree this could be done as a follow-up. > > Regards, > > Hans > > -- Ricardo Ribalda