> On 7 Feb 2025, at 8:43 pm, Christian Hewitt <christianshewitt@xxxxxxxxx> wrote:
>
> Hello folks,
>
> I’ve unearthed a bug, and I need help with further investigation :)
>
> Here’s the description:
>
> Amlogic G12A/G12B/SM1 boards experience a board deadlock when playing HEVC 10-bit media via the hardware decode codec. They can play 10-bit VP9 media, 8-bit HEVC, and H264 media without issues. As the HEVC decoder is unfinished work (patched into a current kernel) the long-running assumption has been that the incomplete code is at fault. As the hard deadlock does not leave any splat clues, the suspicion is memory access or corruption. The 10-bit HEVC path uses MMU, although the VP9 codec which shares a lot of common code with HEVC and also uses MMU works fine. The 8-bit HEVC path is also fine, and this probably explains why HEVC decode has no deadlock issue on older Amlogic chips that do not support 10-bit output.
>
> In the last week someone suggested enabling “slab_debug=FZ” to see if that would generate any new clues. It has (the Z option triggers the issue, the F does not) but not in vdec code. See below:
>
> [ 2.789071] Unable to handle kernel paging request at virtual address 003c021d57237dd2
> [ 2.793032] Mem abort info:
> [ 2.795798] ESR = 0x0000000096000004
> [ 2.799521] EC = 0x25: DABT (current EL), IL = 32 bits
> [ 2.804808] SET = 0, FnV = 0
> [ 2.807840] EA = 0, S1PTW = 0
> [ 2.810961] FSC = 0x04: level 0 translation fault
> [ 2.815814] Data abort info:
> [ 2.818673] ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
> [ 2.824142] CM = 0, WnR = 0, TnD = 0, TagAccess = 0
> [ 2.829160] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
> [ 2.834448] [003c021d57237dd2] address between user and kernel address ranges
> [ 2.841556] Internal error: Oops: 0000000096000004 [#1] SMP
> [ 2.847098] Modules linked in:
> [ 2.850133] CPU: 4 UID: 0 PID: 11 Comm: kworker/u24:0 Not tainted 6.13.1 #1
> [ 2.857063] Hardware name: Hardkernel ODROID-N2Plus (DT)
> [ 2.862352] Workqueue: events_unbound deferred_probe_work_func
> [ 2.868157] pstate: 00000005 (nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [ 2.875090] pc : snd_soc_compensate_channel_connection_map+0xa4/0x1d4
> [ 2.881503] lr : snd_soc_bind_card+0x2e4/0xb4c
> [ 2.885923] sp : ffff80008008b9f0
> [ 2.889217] x29: ffff80008008b9f0 x28: ffff800081b0c970 x27: ffff000007521200
> [ 2.896323] x26: 0000000000000007 x25: ffff000007521088 x24: 0000000000000118
> [ 2.903430] x23: ffff0000082bd5f0 x22: ffff80008125d130 x21: ffff000007521088
> [ 2.910536] x20: ffff800081b4ca10 x19: 0000000000000000 x18: 0000000000000001
> [ 2.917643] x17: 0000000000000068 x16: 00000000000000c0 x15: 0000000000000002
> [ 2.924750] x14: 0000000000000000 x13: 00000000000e4ce3 x12: 0000000000000000
> [ 2.931857] x11: 0000000000000006 x10: 0000000000000000 x9 : 0000000000000008
> [ 2.938963] x8 : 0101010101010101 x7 : 7f7f7f7f7f7f7f7f x6 : 0000000000000001
> [ 2.946070] x5 : 413c021d57237dd2 x4 : 0000000000000001 x3 : 413c021d57237dd2
> [ 2.953177] x2 : 0000000000000000 x1 : 413c021d57237dd2 x0 : 0000000000000000
> [ 2.960284] Call trace:
> [ 2.962710] snd_soc_compensate_channel_connection_map+0xa4/0x1d4 (P)
> [ 2.969124] snd_soc_bind_card+0x2e4/0xb4c
> [ 2.973197] snd_soc_register_card+0xec/0x100
> [ 2.977530] devm_snd_soc_register_card+0x48/0xa0
> [ 2.982210] meson_card_probe+0x214/0x290
> [ 2.986197] platform_probe+0x64/0xcc
> [ 2.989837] really_probe+0xbc/0x388
> [ 2.993390] __driver_probe_device+0x78/0x144
> [ 2.997723] driver_probe_device+0x38/0x118
> [ 3.001883] __device_attach_driver+0xb0/0x144
> [ 3.006303] bus_for_each_drv+0x80/0xe0
> [ 3.009940] usb 1-1: new high-speed USB device number 2 using xhci-hcd
> [ 3.010116] __device_attach+0x9c/0x1b4
> [ 3.020430] device_initial_probe+0x10/0x20
> [ 3.024590] bus_probe_device+0xa8/0xb8
> [ 3.028403] deferred_probe_work_func+0xa4/0xf0
> [ 3.032910] process_one_work+0x140/0x300
> [ 3.036897] worker_thread+0x2a0/0x4d4
> [ 3.040623] kthread+0xdc/0xe0
> [ 3.043657] ret_from_fork+0x10/0x20
> [ 3.047212] Code: b0003616 91046294 9104c2d6 1400000d (b86068a7)
> [ 3.053278] ---[ end trace 0000000000000000 ]---
>
> Boot log (until deadlock) is here: https://pastebin.com/raw/KQmXMaNH
>
> If I disable the sound node in the board device-tree to prevent snd_soc being probed the kernel boots fine. Interestingly I can now ‘play’ HEVC 10-bit media; Kodi fails to show anything on-screen due to ffmpeg not finding a valid V4L2 capture format, but it is otherwise trying to play the media file and I can press ’stop’ to exit playback and all is fine, no deadlock. If I boot without the slab_debug options (and the sound node restored) sound works well without any known issues. There is clearly an interaction between snd_soc and vdec.
>
> The patches that add HEVC codec to the vdec driver are a mix of unfinished work from 2019, and work to add 10-bit support in 2022. I’ve tested with the 10-bit work and all ASoC patches in my tree reverted (a couple of minor backports and a hack for channel mapping) and there’s no change. I’ve built a 6.12 kernel and the issue exists there - I suspect the problem dates back to 2019/20 when the original vdec work was being done.
>
> I will attempt to built a much older 5.x kernel to establish a bisect boundary, but our embedded style distro packaging and the scale of some of the patches will make a proper bisect challenging. To do much more I’ll need some guidance as I don’t have the c or kernel architecture knowledge to properly debug.
>
> What ideas do you have? - What’s next?
In short, the reason HEVC media plays is a bad patch ‘fixing’ a pointer dereference; which broke some V4L2 plumbing causing Kodi/FFMpeg to software decode, thus avoiding the VDEC issue. With that change reverted normal behaviour (deadlock) is restored. So there’s no problem interaction between snd_soc and decoding. Apologies for the noise on that.
The snd_soc failure when “slab_debug=FZ” is enabled is still valid though (AFAICT)?
Christian
[Index of Archives]
[Pulseaudio]
[Linux Audio Users]
[ALSA Devel]
[Fedora Desktop]
[Fedora SELinux]
[Big List of Linux Books]
[Yosemite News]
[KDE Users]