Hi, On Mon, Dec 18, 2023 at 1:17 AM Laura Nao <laura.nao@xxxxxxxxxxxxx> wrote: > > Hello, > > KernelCI has reported a regression on some sc7180 based platforms (lazor > and kingoftown Chromebooks) for linux-next: the kernel seems to hang > after initializing the SDHCI controller (~2 seconds in the boot), > nothing is reported on the console after unused clocks are disabled: > > [ 2.241767] mmc1: Command Queue Engine enabled > [ 2.257574] dwc3 a600000.usb: Adding to iommu group 9 > [ 2.261398] mmc1: new HS400 Enhanced strobe MMC card at address 0001 > [ 2.270452] msm_dsi ae94000.dsi: supply refgen not found, using dummy > regulator > [ 2.274496] mmcblk1: mmc1:0001 DA4064 58.2 GiB > [ 2.294482] mmcblk1: p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 p11 p12 > [ 2.301798] mmcblk1boot0: mmc1:0001 DA4064 4.00 MiB > [ 2.307847] mmcblk1boot1: mmc1:0001 DA4064 4.00 MiB > [ 2.313799] mmcblk1rpmb: mmc1:0001 DA4064 16.0 MiB, chardev (507:0) > [ 14.899579] clk: Disabling unused clocks > > This was observed on next-20231123 first and is still present on > next-20231218. > > Full kernel log from a couple examples: > - next-20231205: > https://storage.kernelci.org/next/master/next-20231205/arm64/defconfig+arm64-chromebook+videodec/gcc-10/lab-collabora/baseline-nfs-sc7180-trogdor-kingoftown.html > - next-20231215: > https://storage.kernelci.org/next/master/next-20231215/arm64/defconfig+arm64-chromebook+videodec/gcc-10/lab-collabora/v4l2-decoder-conformance-h265-sc7180-trogdor-lazor-limozeen.html Is it really hanging? I haven't fully dug into all the logs, but it sure seems like the kernel is not hung, it just isn't doing anything. This looks like the state where the kernel is sitting waiting for the root filesystem to become available so that it can run the init process. >From your command line I see "root=/dev/nfs". Yet nowhere in your boot log do I see a USB network adapter register. I'm going to assume that's the problem. > Sometimes the kernel is able to get past that point, but crashes a bit > later - here's an example from a decoder conformance tests, the kernel > boots fine but crashes shortly afterwards: > - next-20231214: > https://storage.kernelci.org/next/master/next-20231214/arm64/defconfig+arm64-chromebook+videodec/gcc-10/lab-collabora/v4l2-decoder-conformance-h265-sc7180-trogdor-kingoftown.html This log has all kinds of badness. I see the "stuck clock" on the display that Stephen has been talking about for a while. I couldn't reproduce it for a while but I saw it the other day. This needs to be figured out. I then see an "oops" in qcom_stats_probe() that should be fixed by the revert that landed in Bjorn's tree over the weekend: a7dc63435197 Revert "soc: qcom: stats: Add DDR sleep stats" ...then I'm at least slightly shocked that the kernel continues on past an oops. You really don't panic on oops? You then seem to load the r8152 USB Ethernet driver which lets you get the rootfs. Then you're hitting a totally different crash in venus (video decoder/encoder) that needs to be debugged. > Any idea on what might be causing this issue? This seems like the perfect thing to bisect. Is it possible you could do that? -Doug