Re: [bug report] WARNING: CPU: 0 PID: 226 at drivers/pci/pci.c:2236 pci_disable_device+0xf4/0x100

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Mar 20, 2024 at 10:16:06AM +0800, Changhui Zhong wrote:
> On Wed, Mar 20, 2024 at 12:30 AM Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote:
> > On Tue, Mar 19, 2024 at 03:34:56PM +0800, Changhui Zhong wrote:
> > > repo: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
> > > branch: master
> > > commit HEAD:b3603fcb79b1036acae10602bffc4855a4b9af80
> >
> > Where's the rest of this?  I don't see "WARNING: CPU: 0 PID: 226 at
> > drivers/pci/pci.c:2236" in the snippet below.  Please include or post
> > the complete dmesg log.
> >
> > Is this reproducible?  If so, how?  And is it a regression?
> 
> it reproduceible,I can trigger it every time on my server,but I'm not
> sure if it is a regression,

Great, it's always easier if it's easily reproducible.  Can you please
try an older kernel, e.g., v6.8?

> dmesg log on my other server:

Please include or post the *complete* dmesg log all the way from the
very beginning of boot, not just the snippet you included below.  The
complete log contains useful information that we need to investigate
this problem.

> ```
> System Reboot
> .
> [  248.433904] watchdog: watchdog0: watchdog did not stop!
> [  258.459553] systemd-shutdown[1]: Waiting for process: 4506 (sleep),
> 4491 (rhts-reboot)
> [  338.521745] watchdog: watchdog0: watchdog did not stop!
> [  338.556096] dracut Warning: Killing all remaining processes
> dracut Warning: Killing all remaining processes
> [  338.589595] dracut Warning: Unmounted /oldroot.
> dracut Warning: Unmounted /oldroot.
> Rebooting.
> [  339.651690] {1}[Hardware Error]: Hardware error from APEI Generic
> Hardware Error Source: 5
> [  339.659948] {1}[Hardware Error]: event severity: recoverable
> [  339.665606] {1}[Hardware Error]:  Error 0, type: fatal
> [  339.670743] {1}[Hardware Error]:   section_type: PCIe error
> [  339.676310] {1}[Hardware Error]:   port_type: 0, PCIe end point
> [  339.682228] {1}[Hardware Error]:   version: 3.0
> [  339.686761] {1}[Hardware Error]:   command: 0x0002, status: 0x0010
> [  339.692939] {1}[Hardware Error]:   device_id: 0000:04:00.0
> [  339.698427] {1}[Hardware Error]:   slot: 0
> [  339.702525] {1}[Hardware Error]:   secondary_bus: 0x00
> [  339.707664] {1}[Hardware Error]:   vendor_id: 0x14e4, device_id: 0x165f
> [  339.714278] {1}[Hardware Error]:   class_code: 020000
> [  339.719331] {1}[Hardware Error]:   aer_uncor_status: 0x00100000,
> aer_uncor_mask: 0x00010000
> [  339.727678] {1}[Hardware Error]:   aer_uncor_severity: 0x000ef030
> [  339.733769] {1}[Hardware Error]:   TLP Header: 40000001 0000020f
> 90028090 00000000
> [  339.741353] tg3 0000:04:00.0: AER: aer_status: 0x00100000,
> aer_mask: 0x00010000
> [  339.748662] tg3 0000:04:00.0:    [20] UnsupReq               (First)
> [  339.755014] tg3 0000:04:00.0: AER: aer_layer=Transaction Layer,
> aer_agent=Requester ID
> [  339.762924] tg3 0000:04:00.0: AER: aer_uncor_severity: 0x000ef030
> [  339.769018] tg3 0000:04:00.0: AER:   TLP Header: 40000001 0000020f
> 90028090 00000000
> [  339.776761] ------------[ cut here ]------------
> [  339.781378] tg3 0000:04:00.0: disabling already-disabled device
> [  339.781386] WARNING: CPU: 0 PID: 358 at drivers/pci/pci.c:2236
> pci_disable_device+0xf4/0x100
> [  339.795737] Modules linked in: raid1 rpcsec_gss_krb5 auth_rpcgss
> nfsv4 dns_resolver nfs lockd grace netfs rfkill sunrpc ipmi_ssif
> intel_rapl_msr intel_rapl_common intel_uncore_frequency
> intel_uncore_frequency_common i10nm_edac nfit libnvdimm
> x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm vfat fat
> mgag200 rapl dax_hmem iTCO_wdt i2c_algo_bit cxl_acpi
> iTCO_vendor_support drm_shmem_helper intel_cstate acpi_ipmi ipmi_si
> mei_me cxl_core i2c_i801 dell_smbios isst_if_mmio isst_if_mbox_pci
> drm_kms_helper ipmi_devintf intel_uncore dcdbas mei einj
> intel_pch_thermal intel_vsec isst_if_common wmi_bmof
> dell_wmi_descriptor pcspkr i2c_smbus ipmi_msghandler acpi_power_meter
> drm fuse xfs libcrc32c sd_mod t10_pi sg crct10dif_pclmul ahci
> crc32_pclmul libahci crc32c_intel libata tg3 ghash_clmulni_intel wmi
> dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_debug]
> [  339.872243] CPU: 0 PID: 358 Comm: kworker/0:3 Not tainted 6.8.0+ #1
> [  339.878505] Hardware name: Dell Inc. PowerEdge R650xs/0PPTY2, BIOS
> 1.4.4 10/07/2021
> [  339.886157] Workqueue: events aer_recover_work_func
> [  339.891037] RIP: 0010:pci_disable_device+0xf4/0x100
> [  339.895917] Code: 4d 85 e4 75 07 4c 8b a3 c8 00 00 00 48 8d bb c8
> 00 00 00 e8 9e c7 17 00 4c 89 e2 48 c7 c7 50 92 21 91 48 89 c6 e8 ac
> 94 a1 ff <0f> 0b e9 3b ff ff ff e8 80 36 60 00 90 90 90 90 90 90 90 90
> 90 90
> [  339.914664] RSP: 0018:ff56179a82883d10 EFLAGS: 00010286
> [  339.919888] RAX: 0000000000000000 RBX: ff2f7c9b44e58000 RCX: ffffffff9171e4a8
> [  339.927022] RDX: 0000000000000000 RSI: 00000000ffff7fff RDI: 0000000000000001
> [  339.934154] RBP: ff2f7c9b65860000 R08: 0000000000000000 R09: ff56179a82883bc0
> [  339.941289] R10: ff56179a82883bb8 R11: ffffffff917de4e8 R12: ff2f7c9b445fa4e0
> [  339.948421] R13: 0000000000000002 R14: ff2f7c9b44e58148 R15: ff2f7c9b44e5d000
> [  339.955552] FS:  0000000000000000(0000) GS:ff2f7c9eaf600000(0000)
> knlGS:0000000000000000
> [  339.963640] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  339.969385] CR2: 00007f7577713838 CR3: 0000000300a20003 CR4: 0000000000771ef0
> [  339.976519] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [  339.983651] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [  339.990782] PKRU: 55555554
> [  339.993494] Call Trace:
> [  339.995949]  <TASK>
> [  339.998054]  ? __warn+0x7f/0x130
> [  340.001286]  ? pci_disable_device+0xf4/0x100
> [  340.005560]  ? report_bug+0x18a/0x1a0
> [  340.009227]  ? handle_bug+0x3c/0x70
> [  340.012719]  ? exc_invalid_op+0x14/0x70
> [  340.016559]  ? asm_exc_invalid_op+0x16/0x20
> [  340.020745]  ? pci_disable_device+0xf4/0x100
> [  340.025017]  ? __pfx_report_frozen_detected+0x10/0x10
> [  340.030069]  tg3_io_error_detected+0x1f5/0x2b0 [tg3]
> [  340.035044]  ? __pfx_report_frozen_detected+0x10/0x10
> [  340.040098]  report_error_detected+0xc7/0x1c0
> [  340.044456]  ? __pfx_report_frozen_detected+0x10/0x10
> [  340.049509]  __pci_walk_bus+0x6b/0xb0
> [  340.053176]  ? __pfx_aer_root_reset+0x10/0x10
> [  340.057535]  pcie_do_recovery+0x2b4/0x3c0
> [  340.061548]  aer_recover_work_func+0x106/0x110
> [  340.065992]  process_one_work+0x193/0x3d0
> [  340.070005]  worker_thread+0x2fc/0x410
> [  340.073758]  ? __pfx_worker_thread+0x10/0x10
> [  340.078032]  kthread+0xdc/0x110
> [  340.081179]  ? __pfx_kthread+0x10/0x10
> [  340.084930]  ret_from_fork+0x2d/0x50
> [  340.088510]  ? __pfx_kthread+0x10/0x10
> [  340.092263]  ret_from_fork_asm+0x1a/0x30
> [  340.096190]  </TASK>
> [  340.098380] ---[ end trace 0000000000000000 ]---
> [  340.103083] reboot: Restarting system
> [-- MARK -- Tue Mar 19 14:05:00 2024]
> ```
> 




[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux