On Tue, Jan 16, 2018 at 06:15:46PM +0900, Hiraku Toyooka wrote: > Hello, > > I found a NULL pointer dereference in PCI/MSI when I tried to run kdump > kernel on i.MX6(MCIMX6Q-SDB). This error occurs when masking MSI irq > which does not have msi_desc. > I added NULL check to avoid the error, and kdump worked fine. But I'm > not sure this is correct way. What do you think about this fix? It has been reported and it is being handled: https://marc.info/?l=linux-kernel&m=151321815226439&w=2 > My environment: > - Board: MCIMX6Q-SDB > - Kernel: 4.15.0-rc5 (commit: 464e1d5f23) > - used also as kdump kernel > - CONFIG_CRASH_DUMP and CONFIG_DEBUG_INFO are enabled based on imx_v6_v7_defconfig > - U-Boot: u-boot-fslc (2017.11+fslc branch) > - built with meta-freescale (commit: bf7fd9cfe0) > > > Console log in failure case (patch not applied): > > root@imx6qdlsabresd:~# cat /proc/cmdline > console=ttymxc0,115200 root=PARTUUID=6c7357c5-02 rootwait rw quiet crashkernel=96M > root@imx6qdlsabresd:~# kexec --type zImage -p /boot/zImage --dtb=/boot/imx6q-sabresd.dtb --append="console=ttymxc0,115200 root=/dev/mmcblk1p2 rootwait rw 3 maxcpus=1 reset_devices earlycon" > root@imx6qdlsabresd:~# echo c > /proc/sysrq-trigger > [ 27.590895] sysrq: SysRq : Trigger a crash > [ 27.595250] Unable to handle kernel NULL pointer dereference at virtual address 00000000 > ...(snip)... > [ 27.808001] Backtrace: > [ 27.810502] [<c04d1a58>] (sysrq_handle_crash) from [<c04d206c>] (__handle_sysrq+0xd8/0x258) > [ 27.818877] r5:00000063 r4:c101a6b0 > [ 27.822489] [<c04d1f94>] (__handle_sysrq) from [<c04d2688>] (write_sysrq_trigger+0x78/0x90) > [ 27.830871] r10:00000000 r9:00000002 r8:00000000 r7:e6490c00 r6:00000000 r5:01ced738 > [ 27.838719] r4:00000002 > [ 27.841290] [<c04d2610>] (write_sysrq_trigger) from [<c02979bc>] (proc_reg_write+0x68/0x90) > [ 27.849664] r5:00000000 r4:c04d2610 > [ 27.853277] [<c0297954>] (proc_reg_write) from [<c022e50c>] (__vfs_write+0x34/0x134) > [ 27.861050] r9:00000002 r8:01ced738 r7:00000002 r6:e71a9f78 r5:c0297954 r4:e6a49cc0 > [ 27.868824] [<c022e4d8>] (__vfs_write) from [<c022e78c>] (vfs_write+0xa8/0x170) > [ 27.876162] r9:00000002 r8:01ced738 r7:e71a9f78 r6:01ced738 r5:e6a49cc0 r4:00000002 > [ 27.883936] [<c022e6e4>] (vfs_write) from [<c022e96c>] (SyS_write+0x44/0x98) > [ 27.891014] r9:00000002 r8:01ced738 r7:00000000 r6:00000000 r5:e6a49cc0 r4:e6a49cc0 > [ 27.898797] [<c022e928>] (SyS_write) from [<c0107fe0>] (ret_fast_syscall+0x0/0x28) > [ 27.906395] r9:e71a8000 r8:c01081a4 r7:00000004 r6:b6f7eda8 r5:01ced738 r4:00000002 > [ 27.914169] Code: e3a04000 e5835000 ee074f9a ebf11af2 (e5c45000) > [ 27.920332] CPU 1 will stop doing anything useful since another CPU has crashed > [ 27.920342] CPU 0 will stop doing anything useful since another CPU has crashed > [ 27.920351] CPU 2 will stop doing anything useful since another CPU has crashed > [ 27.949670] Unable to handle kernel NULL pointer dereference at virtual address 00000028 > [ 27.957798] pgd = c30fc51b > [ 27.960529] [00000028] *pgd=4a140831 > [ 27.964144] Internal error: Oops: 17 [#2] SMP ARM > [ 27.968869] Modules linked in: > [ 27.971962] CPU: 3 PID: 399 Comm: sh Not tainted 4.15.0-rc5-g3630470 #15 > [ 27.978685] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree) > [ 27.985248] PC is at msi_set_mask_bit+0x18/0x6c > [ 27.989805] LR is at pci_msi_mask_irq+0x14/0x18 > [ 27.994358] pc : [<c0485ee4>] lr : [<c0485f4c>] psr: a0000193 > [ 28.000647] sp : e71a9bb0 ip : e71a9bc8 fp : e71a9bc4 > [ 28.005892] r10: ffffe000 r9 : e682e400 r8 : c101a72c > [ 28.011140] r7 : e71a9c00 r6 : c102a504 r5 : 0000012f r4 : 00000000 > [ 28.017690] r3 : e642b400 r2 : 00000001 r1 : 00000001 r0 : e642b414 > [ 28.024241] Flags: NzCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment none > [ 28.031486] Control: 10c5387d Table: 36ef804a DAC: 00000051 > [ 28.037255] Process sh (pid: 399, stack limit = 0x61f128fb) > [ 28.042849] Stack: (0xe71a9bb0 to 0xe71aa000) > ...(snip)... > [ 28.334445] Backtrace: > [ 28.336937] [<c0485ecc>] (msi_set_mask_bit) from [<c0485f4c>] (pci_msi_mask_irq+0x14/0x18) > [ 28.345224] r5:0000012f r4:e642b400 > [ 28.348844] [<c0485f38>] (pci_msi_mask_irq) from [<c0111308>] (machine_crash_shutdown+0xe8/0x1a0) > [ 28.357763] [<c0111220>] (machine_crash_shutdown) from [<c01b4aa4>] (__crash_kexec+0x5c/0xa0) > [ 28.366319] r9:e682e400 r8:bf000000 r7:c100d9e4 r6:c17c9c88 r5:e71a9df0 r4:e71a9c00 > [ 28.374097] [<c01b4a48>] (__crash_kexec) from [<c01b4b58>] (crash_kexec+0x70/0x80) > [ 28.381691] r6:0000000b r5:ffffffff r4:c10155ac > [ 28.386341] [<c01b4ae8>] (crash_kexec) from [<c010ce8c>] (die+0x230/0x368) > [ 28.393239] r5:e71a9df0 r4:c107b21c > [ 28.396848] [<c010cc5c>] (die) from [<c0116b80>] (__do_kernel_fault.part.0+0x5c/0x7c) > [ 28.404707] r10:e69c96d4 r9:00000000 r8:00000817 r7:e69c9680 r6:00000817 r5:e71a9df0 > [ 28.412556] r4:00000000 > [ 28.415123] [<c0116b24>] (__do_kernel_fault.part.0) from [<c01169a0>] (do_page_fault+0x3a4/0x3c4) > [ 28.424017] r7:e69c9680 r4:e71a9df0 > [ 28.427623] [<c01165fc>] (do_page_fault) from [<c0101388>] (do_DataAbort+0x3c/0xbc) > [ 28.435310] r10:00000000 r9:e71a8000 r8:e71a9df0 r7:00000000 r6:c01165fc r5:00000817 > [ 28.443159] r4:c100e4a0 > [ 28.445723] [<c010134c>] (do_DataAbort) from [<c010d804>] (__dabt_svc+0x64/0xa0) > [ 28.453140] Exception stack(0xe71a9df0 to 0xe71a9e38) > [ 28.458217] 9de0: 00000000 00000730 00000000 00000000 > [ 28.466423] 9e00: 00000000 00000001 c10359a0 00000000 00000004 00000002 00000000 e71a9e54 > [ 28.474627] 9e20: e71a9e30 e71a9e40 c0118694 c04d1aa8 60000013 ffffffff > [ 28.481268] r8:00000004 r7:e71a9e24 r6:ffffffff r5:60000013 r4:c04d1aa8 > [ 28.488012] [<c04d1a58>] (sysrq_handle_crash) from [<c04d206c>] (__handle_sysrq+0xd8/0x258) > [ 28.496385] r5:00000063 r4:c101a6b0 > [ 28.499995] [<c04d1f94>] (__handle_sysrq) from [<c04d2688>] (write_sysrq_trigger+0x78/0x90) > [ 28.508375] r10:00000000 r9:00000002 r8:00000000 r7:e6490c00 r6:00000000 r5:01ced738 > [ 28.516223] r4:00000002 > [ 28.518791] [<c04d2610>] (write_sysrq_trigger) from [<c02979bc>] (proc_reg_write+0x68/0x90) > [ 28.527164] r5:00000000 r4:c04d2610 > [ 28.530774] [<c0297954>] (proc_reg_write) from [<c022e50c>] (__vfs_write+0x34/0x134) > [ 28.538547] r9:00000002 r8:01ced738 r7:00000002 r6:e71a9f78 r5:c0297954 r4:e6a49cc0 > [ 28.546320] [<c022e4d8>] (__vfs_write) from [<c022e78c>] (vfs_write+0xa8/0x170) > [ 28.553658] r9:00000002 r8:01ced738 r7:e71a9f78 r6:01ced738 r5:e6a49cc0 r4:00000002 > [ 28.561433] [<c022e6e4>] (vfs_write) from [<c022e96c>] (SyS_write+0x44/0x98) > [ 28.568511] r9:00000002 r8:01ced738 r7:00000000 r6:00000000 r5:e6a49cc0 r4:e6a49cc0 > [ 28.576291] [<c022e928>] (SyS_write) from [<c0107fe0>] (ret_fast_syscall+0x0/0x28) > [ 28.583890] r9:e71a8000 r8:c01081a4 r7:00000004 r6:b6f7eda8 r5:01ced738 r4:00000002 > [ 28.591662] Code: e24cb004 e590300c e1a02001 e5934008 (e5d43028) > [ 28.597788] ---[ end trace b7f10c526986d6ea ]--- > [ 28.602430] Kernel panic - not syncing: Fatal exception > [ 28.607716] ---[ end Kernel panic - not syncing: Fatal exception > > > Console log in success case (patch applied): > > root@imx6qdlsabresd:~# cat /proc/cmdline > console=ttymxc0,115200 root=PARTUUID=6c7357c5-02 rootwait rw quiet crashkernel=96M > root@imx6qdlsabresd:~# kexec --type zImage -p /boot/zImage --dtb=/boot/imx6q-sabresd.dtb --append="console=ttymxc0,115200 root=/dev/mmcblk1p2 rootwait rw 3 maxcpus=1 reset_devices earlycon" > root@imx6qdlsabresd:~# echo c > /proc/sysrq-trigger > [ 42.951366] sysrq: SysRq : Trigger a crash > [ 42.955711] Unable to handle kernel NULL pointer dereference at virtual address 00000000 > ...(snip)... > [ 43.167849] Backtrace: > [ 43.170314] [<c04d1a5c>] (sysrq_handle_crash) from [<c04d2070>] (__handle_sysrq+0xd8/0x258) > [ 43.178671] r5:00000063 r4:c101a6b0 > [ 43.182258] [<c04d1f98>] (__handle_sysrq) from [<c04d268c>] (write_sysrq_trigger+0x78/0x90) > [ 43.190617] r10:00000000 r9:00000002 r8:00000000 r7:e6490c00 r6:00000000 r5:003b2738 > [ 43.198450] r4:00000002 > [ 43.200995] [<c04d2614>] (write_sysrq_trigger) from [<c02979bc>] (proc_reg_write+0x68/0x90) > [ 43.209350] r5:00000000 r4:c04d2614 > [ 43.212937] [<c0297954>] (proc_reg_write) from [<c022e50c>] (__vfs_write+0x34/0x134) > [ 43.220688] r9:00000002 r8:003b2738 r7:00000002 r6:e6f33f78 r5:c0297954 r4:e6b47680 > [ 43.228439] [<c022e4d8>] (__vfs_write) from [<c022e78c>] (vfs_write+0xa8/0x170) > [ 43.235756] r9:00000002 r8:003b2738 r7:e6f33f78 r6:003b2738 r5:e6b47680 r4:00000002 > [ 43.243508] [<c022e6e4>] (vfs_write) from [<c022e96c>] (SyS_write+0x44/0x98) > [ 43.250564] r9:00000002 r8:003b2738 r7:00000000 r6:00000000 r5:e6b47680 r4:e6b47680 > [ 43.258320] [<c022e928>] (SyS_write) from [<c0107fe0>] (ret_fast_syscall+0x0/0x28) > [ 43.265896] r9:e6f32000 r8:c01081a4 r7:00000004 r6:b6f0eda8 r5:003b2738 r4:00000002 > [ 43.273647] Code: e3a04000 e5835000 ee074f9a ebf11af1 (e5c45000) > [ 43.279767] CPU 3 will stop doing anything useful since another CPU has crashed > [ 43.279771] CPU 2 will stop doing anything useful since another CPU has crashed > [ 43.279775] CPU 0 will stop doing anything useful since another CPU has crashed > [ 43.301962] Loading crashdump kernel... > [ 43.305886] Bye! > [ 0.000000] Booting Linux on physical CPU 0x1 > [ 0.000000] Linux version 4.15.0-rc5-g13f566e (miracle@ar) (gcc version 5.4.0 20160609 (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.4)) #14 SMP Tue Jan 16 06:33:27 UTC 2018 > > > Hiraku Toyooka (1): > PCI/MSI: add NULL check before use of msi_desc > > drivers/pci/msi.c | 3 +++ > 1 file changed, 3 insertions(+) > > -- > 2.7.4 >