Re: [PATCH] media: hantro: Check whether reset op is defined before use

Marek Vasut <marex@xxxxxxx> · Sat, 26 Aug 2023 23:44:54 +0200

On 8/25/23 10:52, Chen-Yu Tsai wrote:
On Fri, Aug 25, 2023 at 4:33 PM Marek Vasut <marex@xxxxxxx> wrote:

On 8/25/23 09:09, Chen-Yu Tsai wrote:
On Thu, Aug 24, 2023 at 9:08 PM Marek Vasut <marex@xxxxxxx> wrote:

On 8/24/23 12:39, Adam Ford wrote:
On Wed, Aug 23, 2023 at 8:39 PM Marek Vasut <marex@xxxxxxx> wrote:

The i.MX8MM/N/P does not define the .reset op since reset of the VPU is
done by genpd. Check whether the .reset op is defined before calling it
to avoid NULL pointer dereference.

Note that the Fixes tag is set to the commit which removed the reset op
from i.MX8M Hantro G2 implementation, this is because before this commit
all the implementations did define the .reset op.

I am surprised I didn't have issues when I was testing the 8MQ and
8MM, but this makes sense.

You need to trigger the VPU watchdog to trigger the crash, that means
you have to get the VPU into some weird state where it fails to decode
frame. Then it triggers the reset and ... boom.

See this patch, that contains a gstreamer invocation to generate such
trigger condition input data:

[PATCH] media: verisilicon: Do not enable G2 postproc downscale if
source is narrower than destination

"
To generate input test data to trigger this bug, use e.g.:
$ gst-launch-1.0 videotestsrc !
video/x-raw,width=272,height=256,format=I420 ! \
                    vp9enc ! matroskamux ! filesink location=/tmp/test.vp9
To trigger the bug upon decoding (note that the NV12 must be forced, as
that assures the output data would pass the G2 postproc):
$ gst-launch-1.0 filesrc location=/tmp/test.vp9 ! matroskademux !
vp9parse ! \
                    v4l2slvp9dec ! video/x-raw,format=NV12 ! videoconvert
! fbdevsink
"

Does it completely recover afterwards? In my previous trials the hardware
ended up in some bizzare state: while decoding succeeds, the output's md5sum
didn't match up.

Have you got a testcase that triggers this, one I can try ?

I am not entirely sure whether this is happening here as well or not,
but I can imagine that the power domain went down and back up between
tests, so the VPU would be power cycled (and therefore reset) that way.
So, I think it is worth testing that.

This was last year while I was writing HEVC decoding code for Chromium.
IIRC the SAODBLK_A_MainConcept_4 test vector from the official HEVC test
suite does cause our stack to crash, but Gstreamer seemed to handle it
OK. It could be that the Chromium decoder stack is passing bad values to
the decoder.

That can be easily tested with ftrace enabled. I was just tracking down 
an issue with gstreamer and added the following patch to the hantro 
driver. Then:

echo > /sys/kernel/debug/tracing/trace
<run fail test>
cat /sys/kernel/debug/tracing/trace > /tmp/fail.trace
echo > /sys/kernel/debug/tracing/trace
<run pass test>
cat /sys/kernel/debug/tracing/trace > /tmp/pass.trace
# remove time stamps etc.
diff /tmp/{fail,pass}.trace

You should see whether some register programming differs between 
gstreamer and chromium.

diff --git a/drivers/media/platform/verisilicon/hantro.h 
b/drivers/media/platform/verisilicon/hantro.h
index dea35a501ba30..529f1ab478ec8 100644
--- a/drivers/media/platform/verisilicon/hantro.h
+++ b/drivers/media/platform/verisilicon/hantro.h
@@ -353,8 +353,7 @@ extern int hantro_debug;

 #define vpu_debug(level, fmt, args...)                         \
        do {                                                    \
-               if (hantro_debug & BIT(level))          \
-                       pr_info("%s:%d: " fmt,                  \
+               trace_printk("%s:%d: " fmt,                     \
                                 __func__, __LINE__, ##args);   \
        } while (0)