Re: [PATCH 3/4] rcar-vin: Stop stream when subdevice signal transfer error

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Niklas, Hans,

I have restarted this patch series with some changes [1], I hope it's ok
for you and I am looking forward to your comments.

Thank you!

[1] https://lore.kernel.org/linux-media/1652983210-1194-1-git-send-email-mrodin@xxxxxxxxxxxxxx/

On Wed, Mar 09, 2022 at 08:27:07PM +0100, Michael Rodin wrote:
> Hi Niklas,
> 
> On Mon, Mar 07, 2022 at 04:26:23PM +0100, Niklas Söderlund wrote:
> > Hi Michael,
> > 
> > On 2022-03-06 21:01:51 +0100, Michael Rodin wrote:
> > > Hi Niklas,
> > > On Wed, Mar 02, 2022 at 09:17:37PM +0100, Niklas Söderlund wrote:
> > > > Hi Michael,
> > > > 
> > > > Thanks for your feedback.
> > > > 
> > > > On 2022-03-02 17:48:34 +0100, Michael Rodin wrote:
> > > > > Hi Niklas, Hans,
> > > > > 
> > > > > On Mon, Nov 15, 2021 at 03:26:53PM +0100, Hans Verkuil wrote:
> > > > > > On 08/11/2021 19:42, Niklas Söderlund wrote:
> > > > > > > Hi Hans,
> > > > > > > 
> > > > > > > On 2021-11-08 18:36:25 +0100, Hans Verkuil wrote:
> > > > > > >> On 08/11/2021 17:02, Niklas Söderlund wrote:
> > > > > > >>> When a subdevice signals a transfer error stop the VIN in addition to
> > > > > > >>> informing user-space of the event.
> > > > > > >>>
> > > > > > >>> Signed-off-by: Niklas Söderlund <https://urldefense.proofpoint.com/v2/url?u=http-3A__niklas.soderlund-2Brenesas-40ragnatech.se&d=DwIDAw&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=sWsgk3pKkv5GeIDM2RZlPY8TjNFU2D0oBeOj6QNBadE&m=iHMOF-_0M012_DHC6tHxMudpVvpBDqktUtXsKy4xpY0&s=isB_ZRIMVizn-pFdEQpwsKeg81nAE-QHMhjUzSozUZg&e=>
> > > > > > >>> Reviewed-by: Jacopo Mondi <jacopo+renesas@xxxxxxxxxx>
> > > > > > >>> ---
> > > > > > >>> * Changes since v3
> > > > > > >>> - Switch to new V4L2_EVENT_XFER_ERROR from V4L2_EVENT_EOS.
> > > > > > >>> - Call vb2_queue_error() when encountering the event.
> > > > > > >>>
> > > > > > >>> * Changes since v2
> > > > > > >>> - Log using vin_dbg() instead of v4l2_info().
> > > > > > >>> ---
> > > > > > >>>  drivers/media/platform/rcar-vin/rcar-v4l2.c | 17 ++++++++++++++++-
> > > > > > >>>  1 file changed, 16 insertions(+), 1 deletion(-)
> > > > > > >>>
> > > > > > >>> diff --git a/drivers/media/platform/rcar-vin/rcar-v4l2.c b/drivers/media/platform/rcar-vin/rcar-v4l2.c
> > > > > > >>> index a5bfa76fdac6e55a..bf17fdefe90aabf5 100644
> > > > > > >>> --- a/drivers/media/platform/rcar-vin/rcar-v4l2.c
> > > > > > >>> +++ b/drivers/media/platform/rcar-vin/rcar-v4l2.c
> > > > > > >>> @@ -992,9 +992,24 @@ void rvin_v4l2_unregister(struct rvin_dev *vin)
> > > > > > >>>  static void rvin_notify_video_device(struct rvin_dev *vin,
> > > > > > >>>  				     unsigned int notification, void *arg)
> > > > > > >>>  {
> > > > > > >>> +	const struct v4l2_event *event;
> > > > > > >>> +
> > > > > > >>>  	switch (notification) {
> > > > > > >>>  	case V4L2_DEVICE_NOTIFY_EVENT:
> > > > > > >>> -		v4l2_event_queue(&vin->vdev, arg);
> > > > > > >>> +		event = arg;
> > > > > > >>> +
> > > > > > >>> +		switch (event->type) {
> > > > > > >>> +		case V4L2_EVENT_XFER_ERROR:
> > > > > > >>> +			vin_dbg(vin,
> > > > > > >>> +				"Subdevice signaled transfer error, stopping.\n");
> > > > > > >>> +			rvin_stop_streaming(vin);
> > > > > > >>> +			vb2_queue_error(&vin->queue);
> > > > > > >>
> > > > > > >> Hmm, wouldn't it be the case that every driver that calls vb2_queue_error()
> > > > > > >> would also have to send this new event? Would it be possible to modify
> > > > > > >> vb2_queue_error() to raise this event? I haven't analyzed all the drivers
> > > > > > >> that call this function to see if that makes sense.
> > > > > > >>
> > > > > > >> Perhaps a separate new function vb2_queue_error_with_event() would also be
> > > > > > >> an option.
> > > > > > > 
> > > > > > > I think that maybe a good idea, but I think that would be needed on-top 
> > > > > > > of this work as I can't really test it. Here the rcar-csi2.ko is a 
> > > > > > > subdevice which detects the error condition and generates the event. And 
> > > > > > > this code is in rcar-vin.ko, the video device driver which reacts to the 
> > > > > > > event and then forwards it to user-space.
> > > > > > > 
> > > > > > > Or am I misunderstanding you? And you think I should remove the 
> > > > > > > v4l2_event_queue() below in favor of a new vb2_queue_error_with_event() 
> > > > > > > call?
> > > > > > 
> > > > > > Yes. And use vb2_queue_error_with_event in other drivers as well where
> > > > > > applicable. Hmm, it can't be called vb2_ since it is v4l2_ specific, so
> > > > > > perhaps v4l2_queue_error which takes a video_device and a vb2_queue as
> > > > > > arguments. I don't want this just in rcar since it makes perfect sense
> > > > > > as a generic event for such situations.
> > > > > 
> > > > > Handling errors in this way could be problematic, because a (CSI2) transfer
> > > > > error does not mean a total hardware failure on Rcar3. From my experience
> > > > > there are 3 kinds of CSI2 errors:
> > > > >   1. errors which occur sometimes, but do not affect video streaming
> > > > >   2. errors which occur on every start of streaming but usually do not
> > > > >      affect actual video streaming to VIN module after the start
> > > > >   3. fatal errors which require a "Software Reset" mentioned by Renesas in
> > > > >      the chapter 25.3.13 of the hardware manual in order to continue
> > > > >      video streaming
> > > > > This patch set makes the video pipeline unusable if we get errors described
> > > > > in the first scenario if I am not mistaken. In the second scenario the
> > > > > video pipeline was already not usable before because we end up in a
> > > > > continuous restart loop in rcar-csi2.c. And the third scenario is not
> > > > > really addressed by this patch set (or maybe the job is offloaded on to
> > > > > userspace)?
> > > > > 
> > > > > Maybe it's better to implement a recovery in a different way, which would
> > > > > consider the three mentioned error scenarios above:
> > > > >   1. Monitor rvin_irq after streaming has started, e.g. by using a timer
> > > > >      (I tried someting similar in [1])
> > > > >   2. restart the complete video pipeline via rvin_stop_streaming and
> > > > >      rvin_start_streaming if no frame is captured in a reasonable amount
> > > > >      of time (optionally after checking if a subdevice has sent a
> > > > >      V4L2_EVENT_XFER_ERROR).
> > > > > This would make the complete recovery process almost invisible for the
> > > > > application and avoid any application changes.
> > > > > 
> > > > > What do you think?
> > > > 
> > > > I think you bring up a few interesting points and for discussions sake I 
> > > > think we need to split it in two. One on how we could implement 
> > > > V4L2_EVENT_XFER_ERROR in a generic sens and one on how to best deal with 
> > > > errors in the R-Car Gen3 capture pipeline.
> > > > 
> > > > For the proposed V4L2_EVENT_XFER_ERROR the idea from my side is that a 
> > > > driver in the pipeline shall only raise this error (and propagate it to 
> > > > the effected video node) when there is no way to recover without 
> > > > involving user-space. So when this event happens an application at the 
> > > > very least needs to do a full s_stream cycle to restart the capture 
> > > > session.
> > > 
> > > Isn't a "full s_stream cycle" something what _every_ capture driver or even
> > > vb2 framework could execute internally without offloading the task to user
> > > space? This new event will probably burden userspace too much. Every V4L2
> > > capture application which worked fine on other SoCs will fail on RCar3 in
> > > the use cases where transfer errors occur until application developers
> > > implement handling of this event. Signalling an error via vb2_queue_error
> > > should be probably sufficient, because it already signals via EPOLLERR/EIO
> > > to userspace to do a streamoff to clear this error (as per videobuf2-core.h).
> > > Maybe we just have to document that userspace should try to do a streamon
> > > after this? EPOLLERR/EIO errors will be propagated to any application, but
> > > not every application subscribes to V4L2 events...
> > 
> > I think doing a "full s_stream cycle" anywhere but from user-space to be 
> > wrong. It will mess-up the state of capture session, for example it 
> > would reset the capture sequence numbers which can freak out user-space 
> > 3A processing. I think that if an error is hit that requires a full 
> > restart of the stream it should be driven from user-space.
> > 
> > But with this series the error is also signaled thru a vb2_queue_error 
> > right? What is added is for the applications that do look at V4L2 events 
> > to know why it happen right? I mean the vb2 queue can signal error but 
> > the event can tell the application if it was due to V4L2_EVENT_EOS,
> > V4L2_EVENT_SOURCE_CHANGE or that something went very wrong using the 
> > proposed V4L2_EVENT_XFER_ERROR event.
> 
> Agree, now I see that it makes sense to have an additional V4L2 event so
> when userspace receives EPOLLERR or EIO, it can check what went wrong by
> dequeuing the received event. If the event was an xfer error, userspace can
> restart the streaming session once and wait for the next EPOLLERR or EIO.
> 
> > > 
> > > I took a look at the other drivers and found that imx-media-csi.c seems to
> > > follow a similar approach to what I described in my last email. There is a
> > > 2 second timer, which monitors the csi_idmac_eof_interrupt and signals a
> > > fatal error to the capture device via vb2_queue_error when this timer
> > > expires. Monitoring end-of-frame interrupts looks also like a more robust
> > > solution, because in this way we know reliably when something has failed
> > > in the pipeline in contrast to reacting to start-of-transfer errors, which
> > > can be false positives. I think it would be good to follow a similar
> > > approach by adding an EOF timer to rvin_irq. This should also cover errors
> > > on the digital pin camera interface if we get any in the future. After
> > > this we could add a few additional improvements:
> > >  1. configurability of the EOF timeout. 2 seconds can be too much,
> > >     especially for automotive use cases. Probably this timeout could be
> > >     reduced depending on the source frame rate.
> > >  2. internal streamoff/streamon cycle after the timer expires instead of
> > >     calling vb2_queue_error
> > 
> > I would need to deep-dive in the details but in general I think anything 
> > we can do to improve the robustness of capture is good. And if we within 
> > a subdevice can recover form errors without causing issues that is good.  
> > But for the work in this series I think it's two different questions.  
> > One is how and if we can add an event to signal something went very 
> > wrong and the capture pipeline can't recover and have stopped, the other 
> > is how we in individual drivers try to detect and recover from errors.
> 
> Agree, this is rather an idea for the second question and a suggestion how
> the patches 3 and 4 could be improved.
> 
> > > 
> > > > On the particulars of the VIN capture pipeline the only way I found so 
> > > > far to freak out the CSI-2 receiver enough to trigger the event with this 
> > > > patch series is to disconnect the HDMI source from the ADV7481 while 
> > > > streaming and I don't think any in kernel recover method can fix that 
> > > > ;-)
> > > 
> > > In this case I would expect an V4L2_EVENT_EOS or V4L2_EVENT_SOURCE_CHANGE
> > > from the adv748x driver. Maybe we should call vb2_queue_error for these
> > > events as well. But the described EOF timeout approach could also cover
> > > this use case.
> > 
> > Hum, is not those events more stable to report "planed" events? From the 
> > docs.
> > 
> > * V4L2_EVENT_EOS
> >   This event is triggered when the end of a stream is reached. This is 
> >   typically used with MPEG decoders to report to the application when 
> >   the last of the MPEG stream has been decoded.
> > 
> > * V4L2_EVENT_SOURCE_CHANGE
> >   This event is triggered when a source parameter change is detected 
> >   during runtime by the video device. It can be a runtime resolution 
> >   change triggered by a video decoder or the format change happening on 
> >   an input connector. This event requires that the id matches the input 
> >   index (when used with a video device node) or the pad index (when used 
> >   with a subdevice node) from which you want to receive events.
> > 
> >   This event has a struct v4l2_event_src_change associated with it. The 
> >   changes bitfield denotes what has changed for the subscribed pad. If 
> >   multiple events occurred before application could dequeue them, then 
> >   the changes will have the ORed value of all the events generated.
> > 
> > While I think the idea for V4L2_EVENT_XFER_ERROR more shall describe the 
> > case something "unnatural" happens that we can't really recover from.  
> > Maybe "ran over cable with lawn mower". I only brought up pulling the 
> > cable as it's the only way I have to trigger it easily (I don't have a 
> > lawn :-).
> 
> I was thinking about this discussion [1]. If I understand the
> recommendation from Hans correctly, userspace should get a SOURCE_CHANGE
> event and then query dv timings to know whether HDMI cable has been
> disconnected or connected.
> 
> [1] Subject: "Re: [PATCH 1/3] v4l2-subdev: Add subdev ioctl support for ENUM/GET/SET INPUT"
> 
> > > 
> > > > Over all I do agree with your idea that if we can recover from errors 
> > > > that are recovererable that is good. For this series I would like to 
> > > > focus on the former to get V4L2_EVENT_XFER_ERROR in and then if we have 
> > > > ways to provoke and test recovery in the Gen3 pipeline we can add such 
> > > > things to the drivers. Do this make sens or do you think the change in 
> > > > the R-Car CSI-2 driver to raise V4L2_EVENT_XFER_ERROR is to harsh? My 
> > > > motivation for is is the new datasheet and discussions with Renesas, but 
> > > > then again my only way to trigger CSI-2 errors is to pull cables while 
> > > > streaming so maybe I'm biased as such issues can't really be recover 
> > > > from...
> > > > 
> > > > Let me know what you think, I was about to spin a new version of this 
> > > > series.
> > > 
> > > I believe this event might be useful for subdevice drivers which want to
> > > request a streamoff/streamon cycle from the capture driver but it's
> > > probably not useful for the RCar3 CSI2 driver in the current state because
> > > reacting to start-of-transfer errors can be risky due to false positives.
> > > 
> > > What do you think about implementing an approach similar to imx-media-csi.c
> > > first?
> > 
> > I'm not sure really what I think. Maybe if I understand better how to 
> > generate these false positives it would help me understand. I also get a 
> > feeling we are trying to solve two different problems. I want to find a 
> > way to just terminate capture when an unrecoverable error is hit while I 
> > get the feeling you are more focused on how to recover from recoverable 
> > errors in the best way. Maybe I'm missing understanding something, sorry 
> > if I am.
> > 
> > I think both problems are important but do not overlap to a large 
> > degree.
> 
> I think, one possibility to generate false positive CSI2 errors on a
> Salvator board is by manipulating CSI2-related registers in the adv7482. I
> took a look at the adv7482 hardware manual and I have found one method by
> reenabling automatically calculated DPHY timings:
> 
> 1. start streaming from HDMI (I am using a V-SG4K-HDI signal generator which
> outputs 1080P@60).
> # media-ctl -d /dev/media0 -r
> # media-ctl -d /dev/media0 -l "'adv748x 4-0070 hdmi':1 -> 'adv748x 4-0070 txa':0 [1]"
> # media-ctl -d /dev/media0 -l "'rcar_csi2 feaa0000.csi2':1 -> 'VIN1 output':0 [1]"
> # media-ctl -d /dev/media0 --set-v4l2 "'adv748x 4-0070 hdmi':1 [fmt:RGB888_1X24/1920x1080 field:none]"
> # media-ctl -d /dev/media0 --set-v4l2 "'adv748x 4-0070 txa':1 [fmt:RGB888_1X24/1920x1080 field:none]"
> # media-ctl -d /dev/media0 --set-v4l2 "'rcar_csi2 feaa0000.csi2':1 [fmt:RGB888_1X24/1920x1080 field:none]"
> # v4l2-ctl --stream-mmap --device /dev/video1 --verbose
> 2. in another shell reenable calculation of the DPHY timings in adv7482:
> # modprobe i2c-dev
> # i2cset -f -y 4 0x64 0x00 0x04 && i2cset -f -y 4 0x64 0x00 0x24
> # dmesg
> [ 8806.752375] rcar-csi2 feaa0000.csi2: Transfer error, restarting CSI-2 receiver
> [ 8806.759642] rcar-csi2 feaa0000.csi2: Transfer error, restarting CSI-2 receiver
> [ 8806.766881] rcar-csi2 feaa0000.csi2: Transfer error, restarting CSI-2 receiver
> [ 8806.774109] rcar-csi2 feaa0000.csi2: Transfer error, restarting CSI-2 receiver
> # 
> 
> There will be some transfer errors, however I have disabled the restart in the
> CSI2 driver so rcar-csi2 will not attempt to recover any more. In
> approximately half of the trials I see that streaming can continue despite
> CSI2 errors. In the other half of my trials streaming stops and I have to
> restart it via v4l2-ctl. Maybe its even possible to simulate CSI2 errors
> which are completely harmless by changing other adv7482 registers. Or was
> my example already helpful to understand what I mean by those false positives?
> I hope this makes clear, why its not good to stop streaming and trigger the
> xfer error event each time we get a CSI2 error but instead setup a timeout
> and wait for the frame like its done in the imx driver.
> 
> > > 
> > > > > 
> > > > > [1] https://urldefense.proofpoint.com/v2/url?u=https-3A__lore.kernel.org_linux-2Drenesas-2Dsoc_1592588777-2D100596-2D1-2Dgit-2Dsend-2Demail-2Dmrodin-40de.adit-2Djv.com_&d=DwIDAw&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=sWsgk3pKkv5GeIDM2RZlPY8TjNFU2D0oBeOj6QNBadE&m=iHMOF-_0M012_DHC6tHxMudpVvpBDqktUtXsKy4xpY0&s=B4GPIasxu8VP04sLBFvhbmr8yhotp6GcfDQqEKjqeAc&e=
> > > > > 
> > > > > > Regards,
> > > > > > 
> > > > > > 	Hans
> > > > > > 
> > > > > > > 
> > > > > > >>
> > > > > > >> Regards,
> > > > > > >>
> > > > > > >> 	Hans
> > > > > > >>
> > > > > > >>> +			break;
> > > > > > >>> +		default:
> > > > > > >>> +			break;
> > > > > > >>> +		}
> > > > > > >>> +
> > > > > > >>> +		v4l2_event_queue(&vin->vdev, event);
> > > > > > >>>  		break;
> > > > > > >>>  	default:
> > > > > > >>>  		break;
> > > > > > >>>
> > > > > > >>
> > > > > > > 
> > > > > > 
> > > > > 
> > > > > -- 
> > > > > Best Regards,
> > > > > Michael
> > > > 
> > > > -- 
> > > > Kind Regards,
> > > > Niklas Söderlund
> > > 
> > > -- 
> > > Best Regards,
> > > Michael
> > 
> > -- 
> > Kind Regards,
> > Niklas Söderlund
> 
> -- 
> Best Regards,
> Michael

-- 
Best Regards,
Michael



[Index of Archives]     [Linux Samsung SOC]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Samba]     [Device Mapper]

  Powered by Linux