On 04/12/2024 08:59, Hans Verkuil wrote: > On 03/12/2024 18:18, Laurent Pinchart wrote: >> On Mon, Dec 02, 2024 at 11:55:20AM +0100, Hans Verkuil wrote: >>> On 02/12/2024 11:26, Hans de Goede wrote: >>>> On 2-Dec-24 9:44 AM, Hans Verkuil wrote: >>>>> On 02/12/2024 09:11, Laurent Pinchart wrote: >>>>>> On Mon, Dec 02, 2024 at 09:05:07AM +0100, Hans Verkuil wrote: >>>>>>> On 02/12/2024 01:18, Laurent Pinchart wrote: >>>>>>>> On Fri, Nov 29, 2024 at 11:18:54PM +0100, Ricardo Ribalda wrote: >>>>>>>>> On Fri, 29 Nov 2024 at 23:03, Laurent Pinchart wrote: >>>>>>>>>> On Fri, Nov 29, 2024 at 07:47:31PM +0100, Ricardo Ribalda wrote: >>>>>>>>>>> Before we all go on a well deserved weekend, let me recap what we >>>>>>>>>>> know. If I did not get something correctly, let me know. >>>>>>>>>>> >>>>>>>>>>> 1) Well behaved devices do not allow to set or get an incomplete async >>>>>>>>>>> control. They will stall instead (ref: Figure 2-21 in UVC 1.5 ) >>>>>>>>>>> 2) Both Laurent and Ricardo consider that there is a big chance that >>>>>>>>>>> some camera modules do not implement this properly. (ref: years of >>>>>>>>>>> crying over broken module firmware :) ) >>>>>>>>>>> >>>>>>>>>>> 3) ctrl->handle is designed to point to the fh that originated the >>>>>>>>>>> control. So the logic can decide if the originator needs to be >>>>>>>>>>> notified or not. (ref: uvc_ctrl_send_event() ) >>>>>>>>>>> 4) Right now we replace the originator in ctrl->handle for unfinished >>>>>>>>>>> async controls. (ref: >>>>>>>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/media/usb/uvc/uvc_ctrl.c#n2050) >>>>>>>>>>> >>>>>>>>>>> My interpretation is that: >>>>>>>>>>> A) We need to change 4). We shall not change the originator of >>>>>>>>>>> unfinished ctrl->handle. >>>>>>>>>>> B) Well behaved cameras do not need the patch "Do not set an async >>>>>>>>>>> control owned by another fh" >>>>>>>>>>> C) For badly behaved cameras, it is fine if we slightly break the >>>>>>>>>>> v4l2-compliance in corner cases, if we do not break any internal data >>>>>>>>>>> structure. >>>>>>>>>> >>>>>>>>>> The fact that some devices may not implement the documented behaviour >>>>>>>>>> correctly may not be a problem. Well-behaved devices will stall, which >>>>>>>>>> means we shouldn't query the device while as async update is in >>>>>>>>>> progress. Badly-behaved devices, whatever they do when queried, should >>>>>>>>>> not cause any issue if we don't query them. >>>>>>>>> >>>>>>>>> I thought we could detect the stall and return safely. Isn't that the case? >>>>>>>> >>>>>>>> We could, but if we know the device will stall anyway, is there a reason >>>>>>>> not to avoid issuing the request in the first place ? >>>>>>>> >>>>>>>>> Why we have not seen issues with this? >>>>>>>> >>>>>>>> I haven't tested a PTZ device for a very long time, and you would need >>>>>>>> to hit a small time window to see the issue. >>>>>>>> >>>>>>>>>> We should not send GET_CUR and SET_CUR requests to the device while an >>>>>>>>>> async update is in progress, and use cached values instead. When we >>>>>>>>>> receive the async update event, we should clear the cache. This will be >>>>>>>>>> the same for both well-behaved and badly-behaved devices, so we can >>>>>>>>>> expose the same behaviour towards userspace. >>>>>>>>> >>>>>>>>> seting ctrl->loaded = 0 when we get an event sounds like a good idea >>>>>>>>> and something we can implement right away. >>>>>>>>> If I have to resend the set I will add it to the end. >>>>>>>>> >>>>>>>>>> We possibly also need some kind of timeout mechanism to cope with the >>>>>>>>>> async update event not being delivered by the device. >>>>>>>>> >>>>>>>>> This is the part that worries me the most: >>>>>>>>> - timeouts make the code fragile >>>>>>>>> - What is a good value for timeout? 1 second, 30, 300? I do not think >>>>>>>>> that we can find a value. >>>>>>>> >>>>>>>> I've been thinking about the implementation of uvc_fh cleanup over the >>>>>>>> weekend, and having a timeout would have the nice advantage that we >>>>>>>> could reference-count uvc_fh instead of implementing a cleanup that >>>>>>>> walks over all controls when closing a file handle. I think it would >>>>>>>> make the code simpler, and possibly safer too. >>>>>>>> >>>>>>>>>> Regarding the userspace behaviour during an auto-update, we have >>>>>>>>>> multiple options: >>>>>>>>>> >>>>>>>>>> For control get, >>>>>>>>>> >>>>>>>>>> - We can return -EBUSY >>>>>>>>>> - We can return the old value from the cache >>>>>>> >>>>>>> This would match the control behavior best. Only when the operation is >>>>>>> done is the control updated and the control event sent. >>>>>>> >>>>>>> Some questions: is any of this documented for UVC? Because this is non-standard >>>>>> >>>>>> No this isn't documented. >>>>>> >>>>>>> behavior. Are there applications that rely on this? Should we perhaps add >>>>>> >>>>>> I don't know. >>>>>> >>>>>>> proper support for this to the control framework? E.g. add an ASYNC flag and >>>>>>> document this? >>>>>> >>>>>> We could, but this is such a specific use case that I don't think is >>>>>> worth adding complexity to the already complex control framework would >>>>>> be worth it. What we could do is perhaps adding a flag for the userspace >>>>>> API, but even there, I never like modelling an API with a single user. >>>>> >>>>> Well, it might be a single driver that uses this, but it is also the most >>>>> used driver by far. I think the only change is to add a flag for this and >>>>> describe how it should behave. And add v4l2-compliance tests for it. >>>>> >>>>> Otherwise no changes to the control framework are needed, I think. >>>>> >>>>> Controls with the ASYNC flag set would: >>>>> >>>>> - return the old value from the cache. >>>>> - document that setting a new value while the operation is in progress >>>>> results in EBUSY. Document that if the new value is equal to the old value, >>>>> then return 0 and do nothing (alternative is to just immediately send >>>>> the control changed event, but that might require a control framework change). >>>>> - when the operation finishes, update the cache to the new value and >>>>> send the control changed event. >>>>> - document that userspace should specify V4L2_EVENT_SUB_FL_ALLOW_FEEDBACK >>>>> when subscribing to the control if you calling fh wants to know when >>>>> the operation finishes. >>>>> - document how timeouts should be handled: this is tricky, especially with >>>>> bad hardware. I.e. if the hw doesn't send the event, does that mean that >>>>> you are never able to set the control since it will stall? >>>>> In the end this will just reflect how UVC handles this. >>>> >>>> I have been catching up on this thread (I have not read the v3 and v4 >>>> threads yet). >>>> >>>> This all started with Ricardo noticing that ctrl->handle may get >>>> overwritten when another app sets the ctrl, causing the first app >>>> to set the ctrl to get a V4L2_EVENT for the ctrl (if subscribed) >>>> even though it set the ctrl itself. >>>> >>>> My observations so far: >>>> >>>> 1. This is only hit when another app changes the ctrl after the first app, >>>> in this case, if there is no stall issued by the hw for the second app's >>>> request, arguably the first app getting the event for the ctrl is correct >>>> since it was changed by the second app. IOW I think the current behavior >>>> is not only fine, but even desirable. Assuming we only override ctrl->handle >>>> after successfully sending the set-ctrl request to the hardware. >> >> I think you're right. >> >>>> 2. This adds a lot of complexity for not sending an event to the app >>>> which made the change. Hans V. suggested maybe adding some sort of flag >>>> for async ctrls to the userspace API. I wonder if we should not just >>>> get rid of this complexity and document that these controls will always >>>> generate events independent of V4L2_EVENT_SUB_FL_ALLOW_FEEDBACK ? >>>> That would certainly simplify things, but it raises the questions if >>>> this will cause issues for existing applications. >>> >>> I'm not that keen on this. That's why a new flag can come in handy since >>> if present, then that indicates that it makes sense to specify >>> V4L2_EVENT_SUB_FL_ALLOW_FEEDBACK when subscribing to the control events. >>> >>> This ensures that uvc follows the current v4l2 spec. It's also why I >>> prefer that g_ctrl will just return the old value until the new value >>> has been reached: that way the control event corresponds with the actual >>> updating of the control value. >>> >>> That said, it's just my opinion and I am OK with UVC doing things a bit >>> differently. Just be aware that I have no objection to adding an ASYNC flag, >>> given how widely UVC is used. >> >> My experience with the V4L2 control API is that we've overdesigned quite >> a few things, and in particular control events. The independent >> "capture" and "control panel" application model that V4L2 controls were >> designed for is not really a good fit for the 21st century anymore. The >> V4L2 API isn't rich enough to arbitrate between applications that are >> not designed to collaborate, and way too rich when applications do >> collaborate. The only two real use cases I found for control events are >> async set completion notification, and notification of automatic changes >> to other controls (and in particular changes to control limits) when a >> control is set. The second use case isn't even something that we support >> well today: to make it really usable, the change notification should be >> *synchronous* with the control set ioctl, returning the information from >> the same ioctl, not through asynchronous control events. > > The main reason you think it is complicated is because the uvc driver does > not use the control framework, so it has to copy all the logic in the driver. > That's very painful. Ideally, uvc should use the control framework, but that > would require a complete overhaul of the uvc driver. > > For all other drivers the complexity is zero since it is all in the framework. > > Some of the Digital Video controls (HOTPLUG, EDID_PRESENT, RXSENSE, > POWER_PRESENT) are meant to be used with control events to inform the applications > when these things change. But you don't deal with HDMI video, so you never see > them in use. The control event mechanism is generic, i.e. available for all > controls. So the use in control panels is just one use-case and it is probably > just qv4l2 that implements it. But control events are great for anything that > happens asynchronously. > > What is missing is support for asynchronous event like the zoom control that > takes time to finish the operation. Ideally I would prefer that it would operate > like the V4L2_CID_AUTO_FOCUS_* controls. But since the current mechanism is > already in use in UVC I am fine with the current uvc approach. I just think > this is something that should be signaled to userspace by a flag and that it > is properly documented. Sorry for this second post, I just wanted to say that in my opinion it is OK if frameworks are complicated internally. That's the whole point of a framework: to put the complexity in one place and hide it from the users of the framework. If a framework was simple, then you probably wouldn't have needed a framework in the first place. The problem with uvc is that you can't use the framework so all the complexity now enters the driver :-( Regards, Hans > > Regarding the second use case: it's perfectly doable, but it would require a > new ioctl. You would need really good arguments for doing that. > > Regards, > > Hans > >> >> TL;DR: If I can pick an option, let's make things simpler, not more >> complex. >> >>>> Note that if we simply return -EBUSY on set until acked by a status >>>> event we also avoid the issue of ctrl->handle getting overwritten, >>>> but that relies on reliable status events; or requires timeout handling. >>>> >>>> 3. I agree with Ricardo that a timeout based approach for cameras which >>>> to not properly send status events for async ctrls is going to be >>>> problematic. Things like pan/tilt homing can take multiple seconds which >>>> is really long to use as a timeout if we plan to return -EBUSY until >>>> the timeout triggers. I think it would be better to just rely on >>>> the hardware sending a stall, or it accepting and correctly handling >>>> a new CUR_SET command while the previous one is still being processed. >>>> >>>> I guess we can track if the hw does send status events when async ctrls >>>> complete and then do the -EBUSY thing without going out to the hw after >>>> the first time an async ctrl has been acked by a status event. >>>> >>>> And then combine that with the current behavior of overwriting ctrl->handle >>>> until the ctrl has been marked as having working status events. So: >>>> >>>> a) In case we do not know yet if a ctrl gets status-event acks; and >>>> on devices without reliable status events keep current behavior. >>>> >>>> b) As soon as we know a ctrl has reliable status events, switch to >>>> returning -EBUSY if a set is pending (as indicated by ctrl->handle >>>> being set). >>>> >>>> I don't like the fact that this changes the behavior after the first >>>> status event acking an async ctrl, but I don't really see another way. >>>> >>>>>>>>>> - We can return the new value fromt he cache >>>>>>>>>> >>>>>>>>>> Returning -EBUSY would be simpler to implement. >>>>>>>>> >>>>>>>>> Not only easy, I think it is the most correct, >>>>>>>>> >>>>>>>>>> I don't think the behaviour should depend on whether the control is read >>>>>>>>>> on the file handle that initiated the async operation or on a different >>>>>>>>>> file handle. >>>>>>>>>> >>>>>>>>>> For control set, I don't think we can do much else than returning >>>>>>>>>> -EBUSY, regardless of which file handle the control is set on. >>>>>>>>> >>>>>>>>> ACK. >>>>>>>>> >>>>>>>>>>> I will send a new version with my interpretation. >>>>>>>>>>> >>>>>>>>>>> Thanks for a great discussion >>>>>>>>> >>>>>>>>> Looking with some perspective... I believe that we should look into >>>>>>>>> the "userspace behaviour for auto controls" in a different patchset. >>>>>>>>> It is slightly unrelated to this discussion. >> >