Hi,
(Please CC me)
I'm testing the Ceph filesystem in an OmniPath network using ipoib and
I'm able to cause an oops every time I run a fio benchmark with more
than one writer for some seconds. Here is the oops of the 6.4.11 kernel
from netconsole:
[ 2116.528509] BUG: kernel NULL pointer dereference, address: 0000000000000010
[ 2116.536343] #PF: supervisor read access in kernel mode
[ 2116.542106] #PF: error_code(0x0000) - not-present page
[ 2116.547853] PGD 0 P4D 0
[ 2116.550699] Oops: 0000 [#1] PREEMPT SMP PTI
[ 2116.555380] CPU: 4 PID: 42 Comm: ksoftirqd/4 Not tainted 6.4.11 #1-NixOS
[ 2116.562889] Hardware name: Intel Corporation S2600WT2R/S2600WT2R, BIOS SE5C610.86B.01.01.0016.033120161139 03/31/2016
[ 2116.574768] RIP: 0010:napi_schedule_prep+0x9/0x50
[ 2116.580050] Code: 68 54 0c 94 e8 58 3e cf ff 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 <48> 8b 4f 10 f6 c1 04 75 29 48 89 ca 48 89 c8 83 e2 01 48 01 d2 48
[ 2116.601069] RSP: 0018:ffffabe5c65f0eb8 EFLAGS: 00010046
[ 2116.606923] RAX: ffffffffc14f1ab0 RBX: 0000000000000000 RCX: 0000000000000001
[ 2116.614916] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[ 2116.622905] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[ 2116.630897] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000617
[ 2116.638887] R13: ffff9164955396b0 R14: 0000000000000016 R15: ffff916498d09a00
[ 2116.646878] FS: 0000000000000000(0000) GS:ffff9173bfb00000(0000) knlGS:0000000000000000
[ 2116.655940] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2116.662375] CR2: 0000000000000010 CR3: 0000000a8ee20002 CR4: 00000000003706e0
[ 2116.670366] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 2116.678356] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 2116.686346] Call Trace:
[ 2116.689089] <IRQ>
[ 2116.691350] ? __die+0x23/0x70
[ 2116.694782] ? page_fault_oops+0x17d/0x4b0
[ 2116.700050] ? ip_protocol_deliver_rcu+0x32/0x170
[ 2116.705968] ? exc_page_fault+0x6d/0x150
[ 2116.711007] ? asm_exc_page_fault+0x26/0x30
[ 2116.716336] ? __pfx_hfi1_ipoib_sdma_complete+0x10/0x10 [hfi1]
[ 2116.723646] ? napi_schedule_prep+0x9/0x50
[ 2116.728875] hfi1_ipoib_sdma_complete+0x38/0x90 [hfi1]
[ 2116.735353] sdma_make_progress+0x178/0x460 [hfi1]
[ 2116.741459] ? __pfx_hfi1_ipoib_sdma_complete+0x10/0x10 [hfi1]
[ 2116.748712] sdma_engine_interrupt+0x72/0x100 [hfi1]
[ 2116.755030] sdma_interrupt+0x36/0x110 [hfi1]
[ 2116.760632] __handle_irq_event_percpu+0x4d/0x1a0
[ 2116.766538] handle_irq_event+0x3e/0x80
[ 2116.771462] handle_edge_irq+0x9d/0x280
[ 2116.776380] __common_interrupt+0x46/0xc0
[ 2116.781495] common_interrupt+0x81/0xa0
[ 2116.786418] </IRQ>
[ 2116.789403] <TASK>
[ 2116.792382] asm_common_interrupt+0x26/0x40
[ 2116.797708] RIP: 0010:skb_segment+0x86b/0xf00
[ 2116.803222] Code: 24 44 8b 74 24 60 49 89 cc 48 8b 4c 24 28 e9 8b 00 00 00 48 8b 11 48 8b 79 08 49 89 14 24 48 89 d0 49 89 7c 24 08 48 8b 50 08 <f6> c2 01 0f 85 c9 03 00 00 0f 1f 44 00 00 f0 ff 40 34 41 8b 44 24
[ 2116.825561] RSP: 0018:ffffabe5c65dbb90 EFLAGS: 00000213
[ 2116.832097] RAX: ffffd6a144ae8c00 RBX: ffff9164af715c00 RCX: ffff9164db525400
[ 2116.840773] RDX: 0000000000000000 RSI: ffff91648734f0e8 RDI: 0000000000008000
[ 2116.849444] RBP: ffffabe5c65dbc60 R08: 0000000000005dac R09: 0000000000006574
[ 2116.858127] R10: 25dd4e99d6e1ffe7 R11: 0000000000000003 R12: ffff916487cb7980
[ 2116.866801] R13: 0000000000005df8 R14: 0000000000000001 R15: 0000000000000000
[ 2116.875493] ? __pfx_csum_partial_ext+0x10/0x10
[ 2116.881263] ? __pfx_csum_block_add_ext+0x10/0x10
[ 2116.887289] tcp_gso_segment+0xec/0x4e0
[ 2116.892247] ? __pfx_tcp_wfree+0x10/0x10
[ 2116.897283] inet_gso_segment+0x159/0x3d0
[ 2116.902393] ? hfi1_ipoib_send+0x246/0x560 [hfi1]
[ 2116.908364] skb_mac_gso_segment+0xa4/0x110
[ 2116.914180] __skb_gso_segment+0xb7/0x170
[ 2116.919271] ? netif_skb_features+0x151/0x2e0
[ 2116.924746] validate_xmit_skb+0x16c/0x340
[ 2116.929930] validate_xmit_skb_list+0x4e/0x70
[ 2116.935392] sch_direct_xmit+0x18a/0x380
[ 2116.940372] __qdisc_run+0x149/0x5a0
[ 2116.944952] net_tx_action+0x1df/0x2a0
[ 2116.949714] __do_softirq+0xca/0x2ae
[ 2116.954278] ? __pfx_smpboot_thread_fn+0x10/0x10
[ 2116.960005] run_ksoftirqd+0x2c/0x40
[ 2116.964575] smpboot_thread_fn+0xdc/0x1d0
[ 2116.969622] kthread+0xe8/0x120
[ 2116.973702] ? __pfx_kthread+0x10/0x10
[ 2116.978465] ret_from_fork+0x2c/0x50
[ 2116.983033] </TASK>
[ 2116.986029] Modules linked in: netconsole ipmi_si nfsv3 nfs_acl nfs lockd grace netfs fscache msr sb_edac edac_core intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common hfi1 x86_pkg_temp_thermal intel_powerclamp coretemp crc32_pclmul polyval_clmulni polyval_generic gf128mul ghash_clmulni_intel sha512_ssse3 sha512_generic aesni_intel mgag200 libaes drm_shmem_helper crypto_simd cryptd igb drm_kms_helper rdmavt rapl iTCO_wdt mei_me intel_cstate intel_pmc_bxt ptp syscopyarea ib_uverbs pps_core watchdog sysfillrect mxm_wmi sunrpc intel_uncore sysimgblt mei i2c_i801 i2c_algo_bit ioatdma i2c_smbus lpc_ich evdev dca input_leds joydev led_class mousedev mac_hid wmi tiny_power_button acpi_power_meter acpi_pad button xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ipt_rpfilter xt_pkttype xt_LOG nf_log_syslog xt_tcpudp nft_compat sch_fq_codel nf_tables libcrc32c nfnetlink atkbd libps2 serio vivaldi_fmap loop cpufreq_powersave tun tap macvlan bridge stp llc kvm irqbypass ib_ipoib ib_cm
[ 2116.986177] ib_umad ib_core ipmi_watchdog ipmi_devintf ipmi_msghandler fuse drm efi_pstore backlight configfs dmi_sysfs ip_tables x_tables autofs4 ext4 crc32c_generic crc16 mbcache jbd2 hid_generic usbhid hid sd_mod ahci xhci_pci xhci_pci_renesas libahci firmware_class ehci_pci xhci_hcd libata ehci_hcd nvme nvme_core usbcore scsi_mod t10_pi crc32c_intel crc64_rocksoft crc64 crc_t10dif crct10dif_generic crct10dif_pclmul crct10dif_common usb_common scsi_common rtc_cmos dm_mod dax [last unloaded: ipmi_si]
[ 2117.145385] CR2: 0000000000000010
[ 2117.149915] ---[ end trace 0000000000000000 ]---
[ 2117.215956] RIP: 0010:napi_schedule_prep+0x9/0x50
[ 2117.222128] Code: 68 54 0c 94 e8 58 3e cf ff 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 <48> 8b 4f 10 f6 c1 04 75 29 48 89 ca 48 89 c8 83 e2 01 48 01 d2 48
[ 2117.244851] RSP: 0018:ffffabe5c65f0eb8 EFLAGS: 00010046
[ 2117.251528] RAX: ffffffffc14f1ab0 RBX: 0000000000000000 RCX: 0000000000000001
[ 2117.260351] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[ 2117.269151] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[ 2117.277962] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000617
[ 2117.286754] R13: ffff9164955396b0 R14: 0000000000000016 R15: ffff916498d09a00
[ 2117.295538] FS: 0000000000000000(0000) GS:ffff9173bfb00000(0000) knlGS:0000000000000000
[ 2117.305396] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2117.312654] CR2: 0000000000000010 CR3: 0000000a8ee20002 CR4: 00000000003706e0
[ 2117.321457] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 2117.330257] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 2117.339079] Kernel panic - not syncing: Fatal exception in interrupt
[ 2117.347081] Kernel Offset: 0x12200000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 2117.420699] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
The OmniPath info from lspci -v:
05:00.0 Fabric controller: Intel Corporation Omni-Path HFI Silicon 100
Series [discrete] (rev 11)
Subsystem: Intel Corporation Omni-Path HFI Silicon 100 Series [discrete]
Flags: bus master, fast devsel, latency 0, IRQ 205, NUMA node 0
Memory at 90000000 (64-bit, non-prefetchable) [size=64M]
Expansion ROM at <ignored> [disabled]
Capabilities: [40] Power Management version 3
Capabilities: [70] Express Endpoint, MSI 00
Capabilities: [b0] MSI-X: Enable+ Count=256 Masked-
Capabilities: [100] Advanced Error Reporting
Capabilities: [148] Secondary PCI Express
Capabilities: [178] Transaction Processing Hints
Kernel driver in use: hfi1
Kernel modules: hfi1
And here is dmesg | grep hfi :
[ 11.299758] hfi1 0000:05:00.0: hfi1_0: Eager buffer size 8388608
[ 11.299841] hfi1 0000:05:00.0: hfi1_0: UC base1: 00000000526a78a9 for 1200000
[ 11.299845] hfi1 0000:05:00.0: hfi1_0: RcvArray count: 65536
[ 11.299862] hfi1 0000:05:00.0: hfi1_0: UC base2: 00000000088b79df for d80000
[ 11.299868] hfi1 0000:05:00.0: hfi1_0: WC piobase: 000000001c60bc9b for 2000000
[ 11.299874] hfi1 0000:05:00.0: hfi1_0: WC RcvArray: 00000000064844ea for 80000
[ 11.299884] hfi1 0000:05:00.0: hfi1_0: Implementation: RTL silicon, revision 0x0
[ 11.299887] hfi1 0000:05:00.0: hfi1_0: GUID 117501017982d2
[ 11.300105] hfi1 0000:05:00.0: hfi1_0: Resetting CSRs with FLR
[ 11.402144] hfi1 0000:05:00.0: hfi1_0: PCIe,5000MHz,x16
[ 11.414874] hfi1 0000:05:00.0: hfi1_0: do_pcie_gen3_transition: downloading firmware
[ 11.414877] hfi1 0000:05:00.0: hfi1_0: Downloading SBus firmware
[ 11.438112] hfi1 0000:05:00.0: hfi1_0: Setting PCIe SerDes broadcast
[ 11.438120] hfi1 0000:05:00.0: hfi1_0: Downloading PCIe firmware
[ 11.475144] hfi1 0000:05:00.0: hfi1_0: do_pcie_gen3_transition: setting PCIe registers
[ 11.475161] hfi1 0000:05:00.0: hfi1_0: do_pcie_gen3_transition: using EQ Pset 2
[ 11.475162] hfi1 0000:05:00.0: hfi1_0: do_pcie_gen3_transition: doing pcie post steps
[ 11.475202] hfi1 0000:05:00.0: hfi1_0: do_pcie_gen3_transition: clearing ASPM
[ 11.475205] hfi1 0000:05:00.0: hfi1_0: do_pcie_gen3_transition: setting parent target link speed
[ 11.475207] hfi1 0000:05:00.0: hfi1_0: do_pcie_gen3_transition: ..old link control2: 0x3
[ 11.475209] hfi1 0000:05:00.0: hfi1_0: do_pcie_gen3_transition: ..target speed is OK
[ 11.475210] hfi1 0000:05:00.0: hfi1_0: do_pcie_gen3_transition: setting target link speed
[ 11.475212] hfi1 0000:05:00.0: hfi1_0: do_pcie_gen3_transition: ..old link control2: 0x2
[ 11.475213] hfi1 0000:05:00.0: hfi1_0: do_pcie_gen3_transition: ..new link control2: 0x3
[ 11.475215] hfi1 0000:05:00.0: hfi1_0: do_pcie_gen3_transition: arming gasket logic
[ 11.475217] hfi1 0000:05:00.0: hfi1_0: do_pcie_gen3_transition: calling trigger_sbr
[ 11.619094] hfi1 0000:05:00.0: hfi1_0: do_pcie_gen3_transition: calling restore_pci_variables
[ 11.619112] hfi1 0000:05:00.0: hfi1_0: do_pcie_gen3_transition: gasket block status: 0x1
[ 11.619117] hfi1 0000:05:00.0: hfi1_0: do_pcie_gen3_transition: per-lane errors: 0x0
[ 11.619123] hfi1 0000:05:00.0: hfi1_0: do_pcie_gen3_transition: new speed and width: PCIe,8000MHz,x16
[ 11.619129] hfi1 0000:05:00.0: hfi1_0: do_pcie_gen3_transition: done
[ 11.653960] hfi1 0000:05:00.0: hfi1_0: Setting partition keys
[ 11.653970] hfi1 0000:05:00.0: hfi1_0: parse_platform_config:File length out of bounds, using alternative format
[ 11.653974] hfi1 0000:05:00.0: hfi1_0: parse_platform_config:File claims to be smaller than read size, continuing
[ 11.653984] hfi1 0000:05:00.0: hfi1_0: Board description not found
[ 11.653990] hfi1 0000:05:00.0: hfi1_0: allocating rx size 2560
[ 11.653999] hfi1 0000:05:00.0: hfi1_0: rcv contexts: chip 160, used 27 (kernel 3, netdev 8, user 16)
[ 11.654005] hfi1 0000:05:00.0: hfi1_0: RcvArray groups 303, ctxts extra 11
[ 11.654011] hfi1 0000:05:00.0: hfi1_0: unused send context blocks: 9
[ 11.654014] hfi1 0000:05:00.0: hfi1_0: send contexts: chip 160, used 44 (kernel 16, ack 3, user 24, vl15 1)
[ 11.654337] hfi1 0000:05:00.0: hfi1_0: Using send context 16(143) for VL15
[ 11.654425] hfi1 0000:05:00.0: hfi1_0: SDMA mod_num_sdma: 0
[ 11.654429] hfi1 0000:05:00.0: hfi1_0: SDMA chip_sdma_engines: 16
[ 11.654432] hfi1 0000:05:00.0: hfi1_0: SDMA chip_sdma_mem_size: 401408
[ 11.654436] hfi1 0000:05:00.0: hfi1_0: SDMA engines 16 descq_cnt 2048
[ 11.654709] hfi1 0000:05:00.0: hfi1_0: SDMA num_sdma: 16
[ 11.655634] hfi1 0000:05:00.0: hfi1_0: 28 MSI-X interrupts allocated
[ 11.655679] hfi1 0000:05:00.0: hfi1_0: IRQ: 216, type GENERAL -> cpu: 0
[ 11.655718] hfi1 0000:05:00.0: hfi1_0: IRQ: 217, type SDMA engine 0 -> cpu: 3
[ 11.655748] hfi1 0000:05:00.0: hfi1_0: IRQ: 218, type SDMA engine 1 -> cpu: 4
[ 11.655775] hfi1 0000:05:00.0: hfi1_0: IRQ: 219, type SDMA engine 2 -> cpu: 5
[ 11.655802] hfi1 0000:05:00.0: hfi1_0: IRQ: 220, type SDMA engine 3 -> cpu: 6
[ 11.655837] hfi1 0000:05:00.0: hfi1_0: IRQ: 221, type SDMA engine 4 -> cpu: 7
[ 11.655865] hfi1 0000:05:00.0: hfi1_0: IRQ: 222, type SDMA engine 5 -> cpu: 3
[ 11.655893] hfi1 0000:05:00.0: hfi1_0: IRQ: 223, type SDMA engine 6 -> cpu: 4
[ 11.655918] hfi1 0000:05:00.0: hfi1_0: IRQ: 224, type SDMA engine 7 -> cpu: 5
[ 11.655941] hfi1 0000:05:00.0: hfi1_0: IRQ: 225, type SDMA engine 8 -> cpu: 6
[ 11.655975] hfi1 0000:05:00.0: hfi1_0: IRQ: 226, type SDMA engine 9 -> cpu: 7
[ 11.656001] hfi1 0000:05:00.0: hfi1_0: IRQ: 227, type SDMA engine 10 -> cpu: 3
[ 11.656027] hfi1 0000:05:00.0: hfi1_0: IRQ: 228, type SDMA engine 11 -> cpu: 4
[ 11.656059] hfi1 0000:05:00.0: hfi1_0: IRQ: 229, type SDMA engine 12 -> cpu: 5
[ 11.656095] hfi1 0000:05:00.0: hfi1_0: IRQ: 230, type SDMA engine 13 -> cpu: 6
[ 11.656132] hfi1 0000:05:00.0: hfi1_0: IRQ: 231, type SDMA engine 14 -> cpu: 7
[ 11.656158] hfi1 0000:05:00.0: hfi1_0: IRQ: 232, type SDMA engine 15 -> cpu: 3
[ 11.656281] hfi1 0000:05:00.0: hfi1_0: IRQ: 233, type RCVCTXT ctxt 0 -> cpu: 0
[ 11.656379] hfi1 0000:05:00.0: hfi1_0: IRQ: 234, type RCVCTXT ctxt 1 -> cpu: 1
[ 11.656483] hfi1 0000:05:00.0: hfi1_0: IRQ: 235, type RCVCTXT ctxt 2 -> cpu: 2
[ 11.656498] hfi1 0000:05:00.0: hfi1_0: Downloading fabric firmware
[ 11.815124] hfi1 0000:05:00.0: hfi1_0: 8051 firmware version 1.27.0
[ 11.827334] hfi1 0000:05:00.0: hfi1_0: SBus Master firmware version 0x10130001
[ 12.019816] hfi1 0000:05:00.0: hfi1_0: PCIe SerDes firmware version 0x4755
[ 12.067297] hfi1 0000:05:00.0: hfi1_0: Fabric SerDes firmware version 0x1055
[ 12.067310] hfi1 0000:05:00.0: hfi1_0: Initializing thermal sensor
[ 14.707102] hfi1 0000:05:00.0: hfi1_0: wait_for_qsfp_init: No IntN detected, reset complete
[ 14.840952] hfi1 0000:05:00.0: hfi1_0: set_link_state: current OFFLINE, new POLL
[ 14.840964] hfi1 0000:05:00.0: hfi1_0: Downloading fabric firmware
[ 15.045304] hfi1 0000:05:00.0: hfi1_0: physical state changed to PHYS_POLL (0x2), phy 0x20
[ 15.045427] hfi1 0000:05:00.0: hfi1_0: Reserving QPNs from 0x800000 to 0x81ffff for non-verbs use
[ 15.056181] hfi1 0000:05:00.0: hfi1_0: created netdev context 3
[ 15.058702] hfi1 0000:05:00.0: hfi1_0: Setting rcv queue 0 napi to context 3
[ 15.058766] hfi1 0000:05:00.0: hfi1_0: IRQ: 236, type NETDEVCTXT ctxt 3 -> cpu: 4
[ 15.058791] hfi1 0000:05:00.0: hfi1_0: created netdev context 4
[ 15.061776] hfi1 0000:05:00.0: hfi1_0: Setting rcv queue 1 napi to context 4
[ 15.061808] hfi1 0000:05:00.0: hfi1_0: IRQ: 237, type NETDEVCTXT ctxt 4 -> cpu: 5
[ 15.061826] hfi1 0000:05:00.0: hfi1_0: created netdev context 5
[ 15.064140] hfi1 0000:05:00.0: hfi1_0: Setting rcv queue 2 napi to context 5
[ 15.064163] hfi1 0000:05:00.0: hfi1_0: IRQ: 238, type NETDEVCTXT ctxt 5 -> cpu: 6
[ 15.064179] hfi1 0000:05:00.0: hfi1_0: created netdev context 6
[ 15.066314] hfi1 0000:05:00.0: hfi1_0: Setting rcv queue 3 napi to context 6
[ 15.066333] hfi1 0000:05:00.0: hfi1_0: IRQ: 239, type NETDEVCTXT ctxt 6 -> cpu: 7
[ 15.066352] hfi1 0000:05:00.0: hfi1_0: created netdev context 7
[ 15.068990] hfi1 0000:05:00.0: hfi1_0: Setting rcv queue 4 napi to context 7
[ 15.069018] hfi1 0000:05:00.0: hfi1_0: IRQ: 240, type NETDEVCTXT ctxt 7 -> cpu: 3
[ 15.069035] hfi1 0000:05:00.0: hfi1_0: created netdev context 8
[ 15.071180] hfi1 0000:05:00.0: hfi1_0: Setting rcv queue 5 napi to context 8
[ 15.071198] hfi1 0000:05:00.0: hfi1_0: IRQ: 241, type NETDEVCTXT ctxt 8 -> cpu: 4
[ 15.071214] hfi1 0000:05:00.0: hfi1_0: created netdev context 9
[ 15.073353] hfi1 0000:05:00.0: hfi1_0: Setting rcv queue 6 napi to context 9
[ 15.073371] hfi1 0000:05:00.0: hfi1_0: IRQ: 242, type NETDEVCTXT ctxt 9 -> cpu: 5
[ 15.073387] hfi1 0000:05:00.0: hfi1_0: created netdev context 10
[ 15.075538] hfi1 0000:05:00.0: hfi1_0: Setting rcv queue 7 napi to context 10
[ 15.075566] hfi1 0000:05:00.0: hfi1_0: IRQ: 243, type NETDEVCTXT ctxt 10 -> cpu: 6
[ 15.078195] hfi1 0000:05:00.0: hfi1_0: Registration with rdmavt done.
[ 15.093030] hfi1 0000:05:00.0 ibp5s0: renamed from ib0
[ 19.722924] hfi1 0000:05:00.0: hfi1_0: set_link_state: current POLL, new VERIFY_CAP
[ 19.722934] hfi1 0000:05:00.0: hfi1_0: physical state changed to PHYS_TRAINING (0x4), phy 0x46
[ 19.722951] hfi1 0000:05:00.0: hfi1_0: Fabric active lanes (width): tx 0xf (4), rx 0xf (4)
[ 19.722956] hfi1 0000:05:00.0: hfi1_0: Peer PHY: power management 0x0, continuous updates 0x1
[ 19.722960] hfi1 0000:05:00.0: hfi1_0: Peer Fabric: vAU 3, Z 1, vCU 0, vl15 credits 0x44, CRC sizes 0x3
[ 19.722965] hfi1 0000:05:00.0: hfi1_0: Peer Link Width: tx rate 0x2, widths 0x8
[ 19.722969] hfi1 0000:05:00.0: hfi1_0: Peer Device ID: 0xabc0, Revision 0x10
[ 19.722974] hfi1 0000:05:00.0: hfi1_0: Final LCB CRC mode: 1
[ 19.722979] hfi1 0000:05:00.0: hfi1_0: set_link_state: current VERIFY_CAP, new GOING_UP
[ 21.960278] hfi1 0000:05:00.0: hfi1_0: 8051: Link up
[ 21.960340] hfi1 0000:05:00.0: hfi1_0: set_link_state: current GOING_UP, new INIT (LINKUP)
[ 21.960350] hfi1 0000:05:00.0: hfi1_0: physical state changed to PHYS_LINKUP (0x5), phy 0x50
[ 21.960359] hfi1 0000:05:00.0: hfi1_0: Neighbor Guid 117501020d8fd4, Type 1, Port Num 33
[ 21.960866] hfi1 0000:05:00.0: hfi1_0: Setting partition keys
[ 21.960879] hfi1 0000:05:00.0: hfi1_0: Fabric active lanes (width): tx 0xf (4), rx 0xf (4)
[ 21.960892] hfi1 0000:05:00.0: hfi1_0: logical state changed to PORT_INIT (0x2)
[ 22.971450] hfi1 0000:05:00.0: hfi1_0: port 1: got a lid: 0x4
[ 22.971467] hfi1 0000:05:00.0: hfi1_0: MTU change on vl 1 from 10240 to 0
[ 22.971472] hfi1 0000:05:00.0: hfi1_0: MTU change on vl 2 from 10240 to 0
[ 22.971476] hfi1 0000:05:00.0: hfi1_0: MTU change on vl 3 from 10240 to 0
[ 22.971479] hfi1 0000:05:00.0: hfi1_0: MTU change on vl 4 from 10240 to 0
[ 22.971482] hfi1 0000:05:00.0: hfi1_0: MTU change on vl 5 from 10240 to 0
[ 22.971486] hfi1 0000:05:00.0: hfi1_0: MTU change on vl 6 from 10240 to 0
[ 22.971489] hfi1 0000:05:00.0: hfi1_0: MTU change on vl 7 from 10240 to 0
[ 22.993233] hfi1 0000:05:00.0: hfi1_0: set_link_state: current INIT, new ARMED
[ 22.993246] hfi1 0000:05:00.0: hfi1_0: logical state changed to PORT_ARMED (0x3)
[ 22.993251] hfi1 0000:05:00.0: hfi1_0: send_idle_message: sending idle message 0x103
[ 22.993273] hfi1 0000:05:00.0: hfi1_0: read_idle_message: read idle message 0x103
[ 22.993281] hfi1 0000:05:00.0: hfi1_0: handle_sma_message: SMA message 0x1
[ 22.993292] hfi1 0000:05:00.0: hfi1_0: read_idle_message: read idle message 0x103
[ 22.993295] hfi1 0000:05:00.0: hfi1_0: handle_sma_message: SMA message 0x1
[ 22.993604] hfi1 0000:05:00.0: hfi1_0: set_link_state: current ARMED, new ACTIVE
[ 22.993625] hfi1 0000:05:00.0: hfi1_0: logical state changed to PORT_ACTIVE (0x4)
[ 22.993633] hfi1 0000:05:00.0: hfi1_0: send_idle_message: sending idle message 0x203
[ 23.014640] hfi1 0000:05:00.0: hfi1_0: read_idle_message: read idle message 0x203
[ 23.014651] hfi1 0000:05:00.0: hfi1_0: handle_sma_message: SMA message 0x2
Best,
Rodrigo.