[Question]PCIe AER driver conflict with Hotplug driver.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



We met a kernel panic after injecting AER fatal error.

do_recovery()
   --->reset_link()
	--->pci_reset_secondary_bus()
It will trigger link down then link up in pci_reset_secondary_bus().
Link down will trigger hotplug driver to remove pcie device under the port.
--->Here will destory the pci_dev *dev.
Then link up will trigger hotplug driver to rescan pcie device under the port.
--->Here will create the new pci_dev *dev_new.

Then do_recovery() will continue call broadcast_error_message(dev).
But the dev has already been destroyed.

I think this is a software bug, so any idea about this bug ?

The error log is as below.

[  266.332262] pcieport 0000:00:0c.0: AER: Uncorrected (Fatal) error received: id=0300
[  266.339914] ixgbe 0000:03:00.0: PCIe Bus Error: severity=Uncorrected (Fatal), type=Unaccessible, id=0300(Unregistered Agent ID)
[  266.352719] pciehp 0000:00:0c.0:pcie004: Slot(0-2): Link Down
[  266.367643] pciehp 0000:00:0c.0:pcie004: Slot(0-2): Link Up
[  266.373236] pciehp 0000:00:0c.0:pcie004: Slot(0-2): Link Up event queued; currently getting powered off
[  266.389339] ixgbe 0000:03:00.1: Adapter removed
[  266.394060] ixgbe 0000:03:00.1: complete
[  266.398084] iommu: Removing device 0000:03:00.1 from group 9
[  266.446062] ixgbe 0000:03:00.0: Adapter removed
[  266.450611] ixgbe 0000:03:00.0: complete
[  266.454616] iommu: Removing device 0000:03:00.0 from group 8
[  266.567607] pci 0000:03:00.0: VF(n) BAR0 space: [mem 0x00000000-0x000fffff 64bit pref] (contains BAR0 for 64 VFs)
[  266.577878] pci 0000:03:00.0: VF(n) BAR3 space: [mem 0x00000000-0x000fffff 64bit pref] (contains BAR3 for 64 VFs)
[  266.588562] pci 0000:03:00.1: VF(n) BAR0 space: [mem 0x00000000-0x000fffff 64bit pref] (contains BAR0 for 64 VFs)
[  266.598830] pci 0000:03:00.1: VF(n) BAR3 space: [mem 0x00000000-0x000fffff 64bit pref] (contains BAR3 for 64 VFs)
[  266.619412] pci 0000:03:00.0: BAR 0: assigned [mem 0x80000000000-0x800003fffff 64bit pref]
[  266.627673] pci 0000:03:00.0: BAR 6: assigned [mem 0xe0000000-0xe03fffff pref]
[  266.634886] pci 0000:03:00.1: BAR 0: assigned [mem 0x80000400000-0x800007fffff 64bit pref]
[  266.643147] pci 0000:03:00.1: BAR 6: assigned [mem 0xe0400000-0xe07fffff pref]
[  266.650359] pci 0000:03:00.0: BAR 4: assigned [mem 0x80000800000-0x80000803fff 64bit pref]
[  266.658619] pci 0000:03:00.0: BAR 7: assigned [mem 0x80000804000-0x80000903fff 64bit pref]
[  266.666876] pci 0000:03:00.0: BAR 10: assigned [mem 0x80000904000-0x80000a03fff 64bit pref]
[  266.675219] pci 0000:03:00.1: BAR 4: assigned [mem 0x80000a04000-0x80000a07fff 64bit pref]
[  266.683479] pci 0000:03:00.1: BAR 7: assigned [mem 0x80000a08000-0x80000b07fff 64bit pref]
[  266.691735] pci 0000:03:00.1: BAR 10: assigned [mem 0x80000b08000-0x80000c07fff 64bit pref]
[  266.700079] pci 0000:03:00.0: BAR 2: assigned [io  0x3000-0x301f]
[  266.706164] pci 0000:03:00.1: BAR 2: assigned [io  0x3020-0x303f]
[  266.712251] pcieport 0000:00:0c.0: PCI bridge to [bus 03]
[  266.717641] pcieport 0000:00:0c.0:   bridge window [io  0x3000-0x3fff]
[  266.724160] pcieport 0000:00:0c.0:   bridge window [mem 0xe0000000-0xe07fffff]
[  266.731374] pcieport 0000:00:0c.0:   bridge window [mem 0x80000000000-0x80000dfffff 64bit pref]
[  266.740239] iommu: Adding device 0000:03:00.0 to group 8
[  266.752272] ixgbe 0000:03:00.0: enabling device (0540 -> 0542)
[  266.913938] ixgbe 0000:03:00.0: Multiqueue Enabled: Rx Queue count = 4, Tx Queue count = 4 XDP Queue count = 0
[  266.924046] ixgbe 0000:03:00.0: PCI Express bandwidth of 32GT/s available
[  266.930826] ixgbe 0000:03:00.0: (Speed:5.0GT/s, Width: x8, Encoding Loss:20%)
[  266.938028] ixgbe 0000:03:00.0: MAC: 2, PHY: 17, SFP+: 5, PBA No: FFFFFF-0FF
[  266.945067] ixgbe 0000:03:00.0: 34:6a:c2:67:12:f9
[  266.954211] ixgbe 0000:03:00.0: Intel(R) 10 Gigabit Network Connection
[  266.961850] iommu: Adding device 0000:03:00.1 to group 9
[  266.967346] irq: type mismatch, failed to map hwirq-641 for <no-node>!
[  266.969838] ixgbe 0000:03:00.0 eth12: renamed from eth0
[  266.979094] ixgbe 0000:03:00.1: enabling device (0540 -> 0542)
[  267.395402] Unable to handle kernel NULL pointer dereference at virtual address 00000018
[  267.403489] Mem abort info:
[  267.406271]   Exception class = DABT (current EL), IL = 32 bits
[  267.412182]   SET = 0, FnV = 0
[  267.415224]   EA = 0, S1PTW = 0
[  267.418356] Data abort info:
[  267.421228]   ISV = 0, ISS = 0x00000004
[  267.425055]   CM = 0, WnR = 0
[  267.428014] user pgtable: 4k pages, 48-bit VAs, pgd = ffff8023ff1fb000
[  267.434533] [0000000000000018] *pgd=0000000000000000
[  267.439490] Internal error: Oops: 96000004 [#1] SMP

 kernel:[  267.439490] Internal error: Oops: 96000004 [#1] SMP

[  267.477964] CPU: 3 PID: 44 Comm: kworker/3:1 Tainted: P           OE   4.14.10 #1
[  267.485440] Workqueue: events aer_isr
[  267.489090] task: ffff80242d4c2100 task.stack: ffff00000a268000
[  267.494996] PC is at pci_walk_bus+0x48/0xb4
[  267.499166] LR is at pci_walk_bus+0x34/0xb4
[  267.503336] pc : [<ffff0000085d564c>] lr : [<ffff0000085d5638>] pstate: 00c00149
[  267.510717] sp : ffff00000a26bc40
[  267.514018] x29: ffff00000a26bc40 x28: 0000000000000300
[  267.519317] x27: ffff80242bfabc00 x26: ffff80242d4fc130
[  267.524617] x25: 0000000000000001 x24: ffff00000964e000
[  267.529916] x23: ffff0000096e3c00 x22: ffff0000085f40c8
[  267.535215] x21: ffff00000a26bcc0 x20: ffff8023bf5c4000
[  267.540514] x19: ffff0000096e3c00 x18: 0000ffffc054fda0
[  267.545813] x17: 0000ffffb85fdfc8 x16: ffff000008301600
[  267.551112] x15: 0000481afc000000 x14: 00310f0600000000
[  267.556411] x13: 000000000000010a x12: 0000000000000018
[  267.561710] x11: 0000000000000000 x10: 0000000000000461
[  267.567010] x9 : 0000000000000007 x8 : ffff000009c7d877
[  267.572309] x7 : 0000000000000000 x6 : 0000000000000461
[  267.577608] x5 : 0000000000000001 x4 : 0000000000000001
[  267.582907] x3 : 0000000000000007 x2 : 0000000000000000
[  267.588206] x1 : ffff0000096e3c28 x0 : 0000000000000000
[  267.593506] Process kworker/3:1 (pid: 44, stack limit = 0xffff00000a268000)
[  267.600453] Call trace:
[  267.602887] Exception stack(0xffff00000a26bb00 to 0xffff00000a26bc40)
[  267.609315] bb00: 0000000000000000 ffff0000096e3c28 0000000000000000 0000000000000007
[  267.617131] bb20: 0000000000000001 0000000000000001 0000000000000461 0000000000000000
[  267.624946] bb40: ffff000009c7d877 0000000000000007 0000000000000461 0000000000000000
[  267.632762] bb60: 0000000000000018 000000000000010a 00310f0600000000 0000481afc000000
[  267.640577] bb80: ffff000008301600 0000ffffb85fdfc8 0000ffffc054fda0 ffff0000096e3c00
[  267.648393] bba0: ffff8023bf5c4000 ffff00000a26bcc0 ffff0000085f40c8 ffff0000096e3c00
[  267.656208] bbc0: ffff00000964e000 0000000000000001 ffff80242d4fc130 ffff80242bfabc00
[  267.664023] bbe0: 0000000000000300 ffff00000a26bc40 ffff0000085d5638 ffff00000a26bc40
[  267.671838] bc00: ffff0000085d564c 0000000000c00149 ffff00000a26bc20 ffff000008d8e20c
[  267.679654] bc20: ffffffffffffffff ffff0000085d5638 ffff00000a26bc40 ffff0000085d564c
[  267.687470] [<ffff0000085d564c>] pci_walk_bus+0x48/0xb4
[  267.692682] [<ffff0000085f4408>] broadcast_error_message+0x9c/0x138
[  267.698936] [<ffff0000085f4638>] do_recovery+0x194/0x240
[  267.704235] [<ffff0000085f4950>] handle_error_source.isra.5+0x38/0x68
[  267.710662] [<ffff0000085f4f14>] aer_isr+0x278/0x2d8
[  267.715614] [<ffff0000080ecf70>] process_one_work+0x144/0x390
[  267.721347] [<ffff0000080ed300>] worker_thread+0x144/0x418
[  267.726819] [<ffff0000080f3e98>] kthread+0x10c/0x138
[  267.731772] [<ffff0000080855dc>] ret_from_fork+0x10/0x18
[  267.737071] Code: aa1703f3 9100a261 eb01001f 54000180 (f9400c01)
[  267.743159] ---[ end trace 9e9ecdcb707cf0e2 ]---
[  267.747763] Kernel panic - not syncing: Fatal exception
[  267.752977] SMP: stopping secondary CPUs
[  267.756889] Kernel Offset: disabled
[  267.760364] CPU features: 0x000a18
[  267.763752] Memory Limit: none
[  267.766797] ---[ end Kernel panic - not syncing: Fatal exception
[  270.027382] ------------[ cut here ]------------
[  270.031989] WARNING: CPU: 3 PID: 44 at kernel/sched/core.c:1178 set_task_cpu+0x18c/0x1a8
[  270.062821] CPU: 3 PID: 44 Comm: kworker/3:1 Tainted: P      D    OE   4.14.10 #1
[  270.070291] Workqueue: events aer_isr
[  270.073940] task: ffff80242d4c2100 task.stack: ffff00000a268000
[  270.079846] PC is at set_task_cpu+0x18c/0x1a8
[  270.084189] LR is at try_to_wake_up+0x154/0x43c
[  270.088706] pc : [<ffff0000080ffc80>] lr : [<ffff0000081005a8>] pstate: 604001c9
[  270.096086] sp : ffff00000801bd50
[  270.099388] x29: ffff00000801bd50 x28: ffff80242d4c2100
[  270.104687] x27: ffff80242ccad650 x26: 0000000000000000
[  270.109986] x25: ffff000009509000 x24: ffff0000094f5000
[  270.115285] x23: 00000000000001c0 x22: 0000000000000004
[  270.120584] x21: ffff80242ccadbf4 x20: 0000000000000000
[  270.125883] x19: ffff80242ccad280 x18: 0000ffffc054fda0
[  270.131182] x17: 0000ffffb85fdfc8 x16: ffff000008301600
[  270.136482] x15: 0000481afc000000 x14: 00003d0900000000
[  270.141781] x13: 0000000000061a80 x12: 000000000000010d
[  270.147080] x11: 7fffffffffffffff x10: 0000000000000002
[  270.152379] x9 : ffff0000094f6680 x8 : 0000000000000000
[  270.157678] x7 : ffff00000fc1bd00 x6 : 000000000000ffff
[  270.162977] x5 : 0000000000000000 x4 : 000000000000ffff
[  270.168276] x3 : 000000000000ffff x2 : ffff000009511d20
[  270.173576] x1 : ffff000009511000 x0 : 0000000000000008
[  270.178875] Call trace:
[  270.181309] Exception stack(0xffff00000801bc10 to 0xffff00000801bd50)
[  270.187735] bc00:                                   0000000000000008 ffff000009511000
[  270.195550] bc20: ffff000009511d20 000000000000ffff 000000000000ffff 0000000000000000
[  270.203366] bc40: 000000000000ffff ffff00000fc1bd00 0000000000000000 ffff0000094f6680
[  270.211181] bc60: 0000000000000002 7fffffffffffffff 000000000000010d 0000000000061a80
[  270.218997] bc80: 00003d0900000000 0000481afc000000 ffff000008301600 0000ffffb85fdfc8
[  270.226813] bca0: 0000ffffc054fda0 ffff80242ccad280 0000000000000000 ffff80242ccadbf4
[  270.234628] bcc0: 0000000000000004 00000000000001c0 ffff0000094f5000 ffff000009509000
[  270.242443] bce0: 0000000000000000 ffff80242ccad650 ffff80242d4c2100 ffff00000801bd50
[  270.250258] bd00: ffff0000081005a8 ffff00000801bd50 ffff0000080ffc80 00000000604001c9
[  270.258074] bd20: ffff00000801bd80 ffff000008100864 0001000000000000 ffff000009509000
[  270.265889] bd40: ffff00000801bd50 ffff0000080ffc80
[  270.270754] [<ffff0000080ffc80>] set_task_cpu+0x18c/0x1a8
[  270.276139] [<ffff0000081005a8>] try_to_wake_up+0x154/0x43c
[  270.281699] [<ffff0000081008b8>] wake_up_process+0x28/0x34
[  270.287172] [<ffff00000814a868>] hrtimer_wakeup+0x28/0x38
[  270.292557] [<ffff00000814aa14>] __hrtimer_run_queues+0xd8/0x290
[  270.298550] [<ffff00000814b454>] hrtimer_interrupt+0xa4/0x1cc
[  270.304285] [<ffff000008b7506c>] arch_timer_handler_phys+0x3c/0x4c
[  270.310452] [<ffff000008133c3c>] handle_percpu_devid_irq+0x8c/0x218
[  270.316705] [<ffff00000812db4c>] generic_handle_irq+0x34/0x4c
[  270.322437] [<ffff00000812e284>] __handle_domain_irq+0x68/0xbc
[  270.328256] [<ffff0000080816e0>] gic_handle_irq+0xd0/0x180
[  270.333727] Exception stack(0xffff00000a26b6c0 to 0xffff00000a26b800)
[  270.340155] b6c0: ffff000008b74ecc 0000000005f5e100 0000000000061a80 0000000000000007
[  270.347970] b6e0: 0000000000000001 0000000000000001 00000000000004a1 746146203a676e69
[  270.355786] b700: ffff0000086d3554 0000000000000044 00000000ffffffff ffff00000a26b500
[  270.363601] b720: 0000000000000000 0000000000000000 0000000000000000 0000481afc000000
[  270.371416] b740: ffff000008301600 0000ffffb85fdfc8 0000ffffc054fda0 ffff0000096d1000
[  270.379232] b760: 0000000b07546984 00000000000186a0 ffff000009bcf000 0000000000000000
[  270.387048] b780: ffff80242d4c2100 0000000000000001 ffff80242d4fc130 ffff80242bfabc00
[  270.394863] b7a0: ffff80242d4c2100 ffff00000a26b800 ffff000008d7363c ffff00000a26b800
[  270.402679] b7c0: ffff000008b74ed8 0000000080400149 ffff000008d7363c 0000000080400149
[  270.410494] b7e0: 0001000000000000 ffff80242bfabc00 ffff00000a26b800 ffff000008b74ed8
[  270.418310] [<ffff0000080830f0>] el1_irq+0xb0/0x140
[  270.423174] [<ffff000008b74ed8>] arch_counter_get_cntpct+0xc/0x4c
[  270.429255] [<ffff000008d7363c>] __delay+0x3c/0x58
[  270.434032] [<ffff000008d73688>] __const_udelay+0x30/0x38
[  270.439418] [<ffff0000080d047c>] panic+0x2b4/0x2c0
[  270.444196] [<ffff00000808af7c>] die+0x194/0x1a0
[  270.448801] [<ffff00000809d14c>] __do_kernel_fault+0xb0/0xe8
[  270.454448] [<ffff000008d92754>] do_page_fault+0x200/0x39c
[  270.459920] [<ffff000008d92960>] do_translation_fault+0x70/0x80
[  270.465825] [<ffff0000080813d0>] do_mem_abort+0x70/0xf4
[  270.471036] Exception stack(0xffff00000a26bb00 to 0xffff00000a26bc40)
[  270.477463] bb00: 0000000000000000 ffff0000096e3c28 0000000000000000 0000000000000007
[  270.485278] bb20: 0000000000000001 0000000000000001 0000000000000461 0000000000000000
[  270.493094] bb40: ffff000009c7d877 0000000000000007 0000000000000461 0000000000000000
[  270.500909] bb60: 0000000000000018 000000000000010a 00310f0600000000 0000481afc000000
[  270.508724] bb80: ffff000008301600 0000ffffb85fdfc8 0000ffffc054fda0 ffff0000096e3c00
[  270.516540] bba0: ffff8023bf5c4000 ffff00000a26bcc0 ffff0000085f40c8 ffff0000096e3c00
[  270.524356] bbc0: ffff00000964e000 0000000000000001 ffff80242d4fc130 ffff80242bfabc00
[  270.532171] bbe0: 0000000000000300 ffff00000a26bc40 ffff0000085d5638 ffff00000a26bc40
[  270.539987] bc00: ffff0000085d564c 0000000000c00149 ffff00000a26bc20 ffff000008d8e20c
[  270.547803] bc20: ffffffffffffffff ffff0000085d5638 ffff00000a26bc40 ffff0000085d564c
[  270.555618] [<ffff000008082f14>] el1_da+0x24/0x84
[  270.560309] [<ffff0000085d564c>] pci_walk_bus+0x48/0xb4
[  270.565521] [<ffff0000085f4408>] broadcast_error_message+0x9c/0x138
[  270.571775] [<ffff0000085f4638>] do_recovery+0x194/0x240
[  270.577074] [<ffff0000085f4950>] handle_error_source.isra.5+0x38/0x68
[  270.583501] [<ffff0000085f4f14>] aer_isr+0x278/0x2d8
[  270.588452] [<ffff0000080ecf70>] process_one_work+0x144/0x390
[  270.594185] [<ffff0000080ed300>] worker_thread+0x144/0x418
[  270.599657] [<ffff0000080f3e98>] kthread+0x10c/0x138
[  270.604608] [<ffff0000080855dc>] ret_from_fork+0x10/0x18
[  270.609906] ---[ end trace 9e9ecdcb707cf0e3 ]---

Thanks,
Dongdong




[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux