Re: sc7180 kernel hang with linux-next

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

On Mon, Dec 18, 2023 at 1:17 AM Laura Nao <laura.nao@xxxxxxxxxxxxx> wrote:
>
> Hello,
>
> KernelCI has reported a regression on some sc7180 based platforms (lazor
> and kingoftown Chromebooks) for linux-next: the kernel seems to hang
> after initializing the SDHCI controller (~2 seconds in the boot),
> nothing is reported on the console after unused clocks are disabled:
>
> [    2.241767] mmc1: Command Queue Engine enabled
> [    2.257574] dwc3 a600000.usb: Adding to iommu group 9
> [    2.261398] mmc1: new HS400 Enhanced strobe MMC card at address 0001
> [    2.270452] msm_dsi ae94000.dsi: supply refgen not found, using dummy
> regulator
> [    2.274496] mmcblk1: mmc1:0001 DA4064 58.2 GiB
> [    2.294482]  mmcblk1: p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 p11 p12
> [    2.301798] mmcblk1boot0: mmc1:0001 DA4064 4.00 MiB
> [    2.307847] mmcblk1boot1: mmc1:0001 DA4064 4.00 MiB
> [    2.313799] mmcblk1rpmb: mmc1:0001 DA4064 16.0 MiB, chardev (507:0)
> [   14.899579] clk: Disabling unused clocks
>
> This was observed on next-20231123 first and is still present on
> next-20231218.
>
> Full kernel log from a couple examples:
> - next-20231205:
>   https://storage.kernelci.org/next/master/next-20231205/arm64/defconfig+arm64-chromebook+videodec/gcc-10/lab-collabora/baseline-nfs-sc7180-trogdor-kingoftown.html
> - next-20231215:
>   https://storage.kernelci.org/next/master/next-20231215/arm64/defconfig+arm64-chromebook+videodec/gcc-10/lab-collabora/v4l2-decoder-conformance-h265-sc7180-trogdor-lazor-limozeen.html

Is it really hanging? I haven't fully dug into all the logs, but it
sure seems like the kernel is not hung, it just isn't doing anything.
This looks like the state where the kernel is sitting waiting for the
root filesystem to become available so that it can run the init
process.

>From your command line I see "root=/dev/nfs". Yet nowhere in your boot
log do I see a USB network adapter register. I'm going to assume
that's the problem.


> Sometimes the kernel is able to get past that point, but crashes a bit
> later - here's an example from a decoder conformance tests, the kernel
> boots fine but crashes shortly afterwards:
> - next-20231214:
>   https://storage.kernelci.org/next/master/next-20231214/arm64/defconfig+arm64-chromebook+videodec/gcc-10/lab-collabora/v4l2-decoder-conformance-h265-sc7180-trogdor-kingoftown.html

This log has all kinds of badness. I see the "stuck clock" on the
display that Stephen has been talking about for a while. I couldn't
reproduce it for a while but I saw it the other day. This needs to be
figured out. I then see an "oops" in qcom_stats_probe() that should be
fixed by the revert that landed in Bjorn's tree over the weekend:

a7dc63435197 Revert "soc: qcom: stats: Add DDR sleep stats"

...then I'm at least slightly shocked that the kernel continues on
past an oops. You really don't panic on oops?

You then seem to load the r8152 USB Ethernet driver which lets you get
the rootfs. Then you're hitting a totally different crash in venus
(video decoder/encoder) that needs to be debugged.


> Any idea on what might be causing this issue?

This seems like the perfect thing to bisect. Is it possible you could do that?

-Doug





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [Linux for Sparc]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux