On Thu, Aug 31, 2023 at 3:13 AM Marek Vasut <marex@xxxxxxx> wrote: > > On 8/30/23 05:38, Chen-Yu Tsai wrote: > > On Sun, Aug 27, 2023 at 5:44 AM Marek Vasut <marex@xxxxxxx> wrote: > >> > >> On 8/25/23 10:52, Chen-Yu Tsai wrote: > >>> On Fri, Aug 25, 2023 at 4:33 PM Marek Vasut <marex@xxxxxxx> wrote: > >>>> > >>>> On 8/25/23 09:09, Chen-Yu Tsai wrote: > >>>>> On Thu, Aug 24, 2023 at 9:08 PM Marek Vasut <marex@xxxxxxx> wrote: > >>>>>> > >>>>>> On 8/24/23 12:39, Adam Ford wrote: > >>>>>>> On Wed, Aug 23, 2023 at 8:39 PM Marek Vasut <marex@xxxxxxx> wrote: > >>>>>>>> > >>>>>>>> The i.MX8MM/N/P does not define the .reset op since reset of the VPU is > >>>>>>>> done by genpd. Check whether the .reset op is defined before calling it > >>>>>>>> to avoid NULL pointer dereference. > >>>>>>>> > >>>>>>>> Note that the Fixes tag is set to the commit which removed the reset op > >>>>>>>> from i.MX8M Hantro G2 implementation, this is because before this commit > >>>>>>>> all the implementations did define the .reset op. > >>>>>>> > >>>>>>> I am surprised I didn't have issues when I was testing the 8MQ and > >>>>>>> 8MM, but this makes sense. > >>>>>> > >>>>>> You need to trigger the VPU watchdog to trigger the crash, that means > >>>>>> you have to get the VPU into some weird state where it fails to decode > >>>>>> frame. Then it triggers the reset and ... boom. > >>>>>> > >>>>>> See this patch, that contains a gstreamer invocation to generate such > >>>>>> trigger condition input data: > >>>>>> > >>>>>> [PATCH] media: verisilicon: Do not enable G2 postproc downscale if > >>>>>> source is narrower than destination > >>>>>> > >>>>>> " > >>>>>> To generate input test data to trigger this bug, use e.g.: > >>>>>> $ gst-launch-1.0 videotestsrc ! > >>>>>> video/x-raw,width=272,height=256,format=I420 ! \ > >>>>>> vp9enc ! matroskamux ! filesink location=/tmp/test.vp9 > >>>>>> To trigger the bug upon decoding (note that the NV12 must be forced, as > >>>>>> that assures the output data would pass the G2 postproc): > >>>>>> $ gst-launch-1.0 filesrc location=/tmp/test.vp9 ! matroskademux ! > >>>>>> vp9parse ! \ > >>>>>> v4l2slvp9dec ! video/x-raw,format=NV12 ! videoconvert > >>>>>> ! fbdevsink > >>>>>> " > >>>>> > >>>>> Does it completely recover afterwards? In my previous trials the hardware > >>>>> ended up in some bizzare state: while decoding succeeds, the output's md5sum > >>>>> didn't match up. > >>>> > >>>> Have you got a testcase that triggers this, one I can try ? > >>>> > >>>> I am not entirely sure whether this is happening here as well or not, > >>>> but I can imagine that the power domain went down and back up between > >>>> tests, so the VPU would be power cycled (and therefore reset) that way. > >>>> So, I think it is worth testing that. > >>> > >>> This was last year while I was writing HEVC decoding code for Chromium. > >>> IIRC the SAODBLK_A_MainConcept_4 test vector from the official HEVC test > >>> suite does cause our stack to crash, but Gstreamer seemed to handle it > >>> OK. It could be that the Chromium decoder stack is passing bad values to > >>> the decoder. > >> > >> That can be easily tested with ftrace enabled. I was just tracking down > >> an issue with gstreamer and added the following patch to the hantro > >> driver. Then: > >> > >> echo > /sys/kernel/debug/tracing/trace > >> <run fail test> > >> cat /sys/kernel/debug/tracing/trace > /tmp/fail.trace > >> echo > /sys/kernel/debug/tracing/trace > >> <run pass test> > >> cat /sys/kernel/debug/tracing/trace > /tmp/pass.trace > >> # remove time stamps etc. > >> diff /tmp/{fail,pass}.trace > >> > >> You should see whether some register programming differs between > >> gstreamer and chromium. > > > > I ended up using VISL to compare the controls set. I did find a bug. > > It still hard hangs after a couple frames, so I guess I'd need to use > > your method, but do printk instead. > > > > BTW, I wonder if we shouldn't add a reset op, if only just to stop the > > hardware? That is, do the same two register writes as in the Hantro G2 > > interrupt handler. > > You mean these two ? > > 38 vdpu_write(vpu, 0, G2_REG_INTERRUPT); > 39 vdpu_write(vpu, G2_REG_CONFIG_DEC_CLK_GATE_E, G2_REG_CONFIG); Yes. > As far as I understand this, that only clears IRQ and gates the clock > off, but it doesn't reset the IP state, does it ? That's right, but it would stop the hardware from continuing to do whatever it is doing before it gets shut down through runtime PM. I'm not sure it would make that much of a difference, but it did seem like something that could be done. ChenYu