* Tony Lindgren <tony@xxxxxxxxxxx> [180325 15:20]: > Hi, > > * Arnd Bergmann <arnd@xxxxxxxx> [180325 13:30]: > > On Sun, Mar 25, 2018 at 3:03 PM, Christophe Lyon > > <christophe.lyon@xxxxxxxxxx> wrote: > > > Hi Arnd, > > > > > > We have a Jenkins jobs that builds the kernel from torvalds/linux > > > master branch mutli_v7 defconfig every day, using our last GCC release > > > (7.2-2017-11), and boots a beaglebone-black board. > > > > > > Last week it started to fail, I first suspected a Lava problem, but > > > the job now fails every time, and Remi Duraffort from the Lava team > > > thinks it's really a kernel problem. > > > > > > Is this something you are interested in investigating? Or should we > > > switch to another "less-edge" branch? > > > > > > The last successful run: > > > https://ci.linaro.org/job/tcwg-buildapp/app=linux+multi_v7,label=tcwg-x86_64-build,target=arm-linux-gnueabihf/75/ > > > The next one failed: > > > https://ci.linaro.org/job/tcwg-buildapp/app=linux+multi_v7,label=tcwg-x86_64-build,target=arm-linux-gnueabihf/76 > > > > > > Build 75 was with this kernel commit: > > > Merge branch 'for-4.16-fixes' > > > 1b5f3ba415fe4cf8b8b39c8d104ed44cde330658 > > > > > > Build 76 was with: > > > Merge tag 'clk-fixes-for-linus' > > > 3215b9d57a2c75c4305a3956ca303d7004485200 > > > > Hi Christophe, > > > > This branch is certainly the right one to test, thanks for the report! > > From looking at the output above, it seems that the kernel no longer > > boots at all, and fails to even print any messages. Between the > > two runs, I see the following commits: > > > > 3215b9d57a2c Merge tag 'clk-fixes-for-linus' of > > git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux > > 303851e14a8f Merge tag 'for-linus' of > > git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma > > 76c0b6a36a12 Merge tag 'scsi-fixes' of > > git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi > > 645102eac15e Merge tag 'nfsd-4.16-1' of git://linux-nfs.org/~bfields/linux > > 32d43cd391ba kvm/x86: fix icebp instruction handling > > e8980d67d601 RDMA/ucma: Ensure that CM_ID exists prior to access it > > 68ef3bc31664 nfsd: remove blocked locks on client teardown > > 80cf79ae4f68 RDMA/verbs: Remove restrack entry from XRCD structure > > ed65a4dc2208 RDMA/ucma: Fix use-after-free access in ucma_close > > 7997f3b2df75 clk: bcm2835: Protect sections updating shared registers > > 49012d1bf5f7 clk: bcm2835: Fix ana->maskX definitions > > 2975d5de6428 RDMA/ucma: Check AF family prior resolving address > > 8a53fc511c5e clk: aspeed: Prevent reset if clock is enabled > > d90c76bb6112 clk: aspeed: Fix is_enabled for certain clocks > > bd8602ca42f6 infiniband: bnxt_re: use BIT_ULL() for 64-bit bit masks > > 5388a508479d infiniband: qplib_fp: fix pointer cast > > 42cea83f9524 IB/mlx5: Fix cleanup order on unload > > 0c81ffc60d52 RDMA/ucma: Don't allow join attempts for unsupported AF family > > 7688f2c3bbf5 RDMA/ucma: Fix access to non-initialized CM_ID object > > 9dea9a2ff61c RDMA/core: Do not use invalid destination in determining port reuse > > f3f134f5260a RDMA/mlx5: Fix crash while accessing garbage pointer and > > freed memory > > c2b37f76485f IB/mlx5: Fix integer overflows in mlx5_ib_create_srq > > 2c292dbb398e IB/mlx5: Fix out-of-bounds read in create_raw_packet_qp_rq > > 14bc1dff7427 scsi: qla2xxx: Remove FC_NO_LOOP_ID for FCP and FC-NVMe Discovery > > 318aaf34f117 scsi: libsas: defer ata device eh commands to libata > > 55c19eee3b47 clk: qcom: msm8916: Fix return value check in > > qcom_apcs_msm8916_clk_probe() > > 9903e41ae1f5 clk: hisilicon: hi3660:Fix potential NULL dereference in > > hi3660_stub_clk_probe() > > 56e1ee353943 Merge branch 'clk-helpers' (early part) into clk-fixes > > 04bf9ab3359f clk: fix determine rate error with pass-through clock > > 91584eb51b47 Merge branch 'clk-phase' into clk-fixes > > bd13c6cbd3c0 Merge tag 'ti-clk-fixes-4.16' of > > https://github.com/t-kristo/linux-pm into clk-fixes > > a88bb86d58ce Merge tag 'clk-imx-fixes-4.16' of > > git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux into > > clk-fixes > > 957a42e8599a Merge tag 'sunxi-clk-fixes-for-4.16' of > > https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux into > > clk-fixes > > 99652a469df1 clk: migrate the count of orphaned clocks at init > > 7f95beea3608 clk: update cached phase to respect the fact when setting phase > > 762790b75210 clk: ti: am43xx: add set-rate-parent support for display > > clkctrl clock > > c083dc5f3738 clk: ti: am33xx: add set-rate-parent support for display > > clkctrl clock > > 49159a9dc3da clk: ti: clkctrl: add support for CLK_SET_RATE_PARENT flag > > a275b315334d clk: imx51-imx53: Fix UART4/5 registration on i.MX50 and i.MX53 > > 5682e268350f clk: sunxi-ng: a31: Fix CLK_OUT_* clock ops > > > > Out of these, All the interesting ones are clk related: > > > > 56e1ee353943 Merge branch 'clk-helpers' (early part) into clk-fixes > > 04bf9ab3359f clk: fix determine rate error with pass-through clock > > 91584eb51b47 Merge branch 'clk-phase' into clk-fixes > > bd13c6cbd3c0 Merge tag 'ti-clk-fixes-4.16' of > > https://github.com/t-kristo/linux-pm into clk-fixes > > 99652a469df1 clk: migrate the count of orphaned clocks at init > > 7f95beea3608 clk: update cached phase to respect the fact when setting phase > > 762790b75210 clk: ti: am43xx: add set-rate-parent support for display > > clkctrl clock > > c083dc5f3738 clk: ti: am33xx: add set-rate-parent support for display > > clkctrl clock > > 49159a9dc3da clk: ti: clkctrl: add support for CLK_SET_RATE_PARENT flag > > > > I've added the involved parties to Cc. We also see the same thing on > > kernelci, where many OMAP based systems now fail to boot, with the > > problem starting at the same commit: > > > > https://kernelci.org/boot/all/job/mainline/branch/master/kernel/v4.16-rc6-431-gbcfc1f455466/ > > > > It's possible that this has already been debugged and a fix is being worked on, > > but I'm not aware of anything, since I have not followed my email > > while travelling. > > I've confirmed that omap2plus_defconfig boots on bbb while > multi_v7_defconfig fails to boot with the following: > > l4_wkup_cm:clk:0010:0: failed to disable > Unhandled fault: external abort on non-linefetch (0x1028) at 0xfa30e054 > pgd = 4b21228f > [fa30e054] *pgd=48211452(bad) > Internal error: : 1028 [#1] SMP ARM > Modules linked in: > CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.16.0-rc6-00075-g3215b9d57a2c #709 > Hardware name: Generic AM33XX (Flattened Device Tree) > PC is at _update_sysc_cache+0x2c/0x88 > LR is at _enable+0x19c/0x274 > pc : [<c032a844>] lr : [<c032afc8>] psr: 40000013 > sp : db0adea0 ip : 00000003 fp : 00000000 > r10: c144997c r9 : 00000157 r8 : 00000003 > r7 : c151d30c r6 : 00000000 r5 : c1678ef4 r4 : c151b2f0 > r3 : fa30e054 r2 : c151b360 r1 : 00000054 r0 : c151b2f0 > Flags: nZcv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none > Control: 10c5387d Table: 80204019 DAC: 00000051 > Process swapper/0 (pid: 1, stack limit = 0x2ddf0754) > Stack: (0xdb0adea0 to 0xdb0ae000) > dea0: c151b2f0 c032afc8 00000000 a0000013 c1504c48 c151b2f0 c151b314 c1504c48 > dec0: c151b328 c1311c78 a0000013 c0c15ec4 00000011 edaa6d91 c131297c c151b2f0 > dee0: c150ce28 c131297c ffffe000 c1312a68 c1504c48 00000000 c131297c c0302730 > df00: dfdffb06 dfdffafa c1250ecc 00000100 00000157 c0361f34 c124f400 c10cc358 > df20: 00000000 00000002 00000002 c10dec28 00000000 c1504c48 c10eeca0 c10dec9c > df40: 00000000 dfdffb06 00000000 edaa6d91 00000000 c1677700 c1677700 c13cf824 > df60: c13cf83c 00000003 00000157 c144997c 00000000 c1300e2c 00000002 00000002 > df80: 00000000 c13005c0 00000000 c0d96788 00000000 00000000 00000000 00000000 > dfa0: 00000000 c0d96790 00000000 c03010e8 00000000 00000000 00000000 00000000 > dfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 > dfe0: 00000000 00000000 00000000 00000000 00000013 00000000 d5370d56 dcffd777 > [<c032a844>] (_update_sysc_cache) from [<c032afc8>] (_enable+0x19c/0x274) > [<c032afc8>] (_enable) from [<c1311c78>] (_setup.part.16+0xd8/0x418) > [<c1311c78>] (_setup.part.16) from [<c1312a68>] (__omap_hwmod_setup_all+0xec/0x100) > [<c1312a68>] (__omap_hwmod_setup_all) from [<c0302730>] (do_one_initcall+0x54/0x18c) > [<c0302730>] (do_one_initcall) from [<c1300e2c>] (kernel_init_freeable+0x144/0x1d0) > [<c1300e2c>] (kernel_init_freeable) from [<c0d96790>] (kernel_init+0x8/0x110) > [<c0d96790>] (kernel_init) from [<c03010e8>] (ret_from_fork+0x14/0x2c) > Exception stack(0xdb0adfb0 to 0xdb0adff8) > dfa0: 00000000 00000000 00000000 00000000 > dfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 > dfe0: 00000000 00000000 00000000 00000000 00000013 00000000 > Code: e31c0c01 e5903048 e0833001 1a00000a (e5933000) > > Tero, it might be some timing related clock issue? Looks like git bisect points to commit c083dc5f3738 ("clk: ti: am33xx: add set-rate-parent support for display clkctrl clock"). I also verified reverting it makes bbb boot again. Regards, Tony -- To unsubscribe from this list: send the line "unsubscribe linux-omap" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html