Hi all, I am pretty sure that this is a kernel issue related to centos stream and probably Dell PowerEdge C6420, but I want to let you know about this it just in case someone is going to upgrade centos stream to the latest kernel 4.18.0-358.el8.x86_64 and finds the same problem. Yesterday I was investigating a strange issue, where after upgrading the OS (Centos Stream) the nodes were rebooting each 2 hours. Looking at the cephadm logs we got errors of the type: cephadm [ERR] Failed to execute command: /usr/bin/python3 /var/lib/ceph/c6e89d30-de52-11eb-a76f-bc97e1e57d70/cephadm.f46dc95b01feeedb28941a48e2f1d0abb51139ca828de11150ea7122a8e3549c gather-facts And I was able to reproduce the issue just running the python program in one of the updating nodes to gather-facts: root@c27-35 ~]# /usr/bin/python3 debug.py gather-facts > /root/debug.py(7004)command_gather_facts() -> host = HostFacts(ctx) (Pdb) s --Call-- > /root/debug.py(6498)__init__() -> def __init__(self, ctx: CephadmContext): .... (Pdb) n > /root/debug.py(6509)__init__() -> self.arch: str = platform.processor() (Pdb) n > /root/debug.py(6510)__init__() -> self.kernel: str = platform.release() (Pdb) n --Return-- > /root/debug.py(6510)__init__()->None -> self.kernel: str = platform.release() (Pdb) > /root/debug.py(7005)command_gather_facts() -> print(host.dump()) (Pdb) client_loop: send disconnect: Broken pipe And when it reaches the host.dump() the server hangs with the following kernel panic: [ 572.332036] BUG: unable to handle kernel paging request at 0000559e8860e740 [ 572.415388] PGD 1a62022067 P4D 1a62022067 PUD 1a62023067 PMD 1868b7a067 PTE 80000018e6ae8867 [ 572.516372] Oops: 0003 [#1] SMP NOPTI [ 572.560156] CPU: 41 PID: 8408 Comm: sysctl Kdump: loaded Tainted: G I --------- - - 4.18.0-358.el8.x86_64 #1 [ 572.693381] Hardware name: Dell Inc. PowerEdge C6420/0YTVTT, BIOS 2.12.2 07/14/2021 [ 572.785007] RIP: 0010:memcpy_erms+0x6/0x10 [ 572.833991] Code: 90 90 90 90 eb 1e 0f 1f 00 48 89 f8 48 89 d1 48 c1 e9 03 83 e2 07 f3 48 a5 89 d1 f3 a4 c3 66 0f 1f 44 00 00 48 89 f8 48 89 d1 <f3> a4 c3 0f 1f 80 00 00 00 00 48 89 f8 48 83 fa 20 72 7e 40 38 f e [ 573.058737] RSP: 0018:ffffb2fd0d1ebe28 EFLAGS: 00010297 [ 573.121242] RAX: 0000559e8860e740 RBX: 0000000000000002 RCX: 0000000000000002 [ 573.206628] RDX: 0000000000000002 RSI: ffffb2fd0d1ebe37 RDI: 0000559e8860e740 [ 573.292013] RBP: ffffb2fd0d1ebf08 R08: 0000000000000000 R09: 0000000000000000 [ 573.377397] R10: ffffb2fd0d1ebe80 R11: ffffb2fd0d1ebe38 R12: ffffb2fd0d1ebe80 [ 573.462781] R13: 0000559e8860e740 R14: 0000000000000002 R15: ffffffffc14d6e00 [ 573.548168] FS: 00007fba08251940(0000) GS:ffff8b46df700000(0000) knlGS:0000000000000000 [ 573.644993] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 573.713737] CR2: 0000559e8860e740 CR3: 000000184f3e4002 CR4: 00000000007706e0 [ 573.799122] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 573.884507] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 573.969892] PKRU: 55555554 [ 574.002235] Call Trace: [ 574.031462] svcrdma_counter_handler+0xc1/0x110 [rpcrdma] [ 574.096045] proc_sys_call_handler+0x1a5/0x1c0 [ 574.149191] vfs_read+0x91/0x140 [ 574.187773] ksys_read+0x4f/0xb0 [ 574.226358] do_syscall_64+0x5b/0x1a0 [ 574.270142] entry_SYSCALL_64_after_hwframe+0x65/0xca [ 574.330566] RIP: 0033:0x7fba0761b555 [ 574.373312] Code: fe ff ff 50 48 8d 3d 22 c9 06 00 e8 25 ed 01 00 0f 1f 44 00 00 f3 0f 1e fa 48 8d 05 45 40 2a 00 8b 00 85 c0 75 0f 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 53 c3 66 90 41 54 49 89 d4 55 48 89 f5 53 8 9 [ 574.598058] RSP: 002b:00007ffdf1b482d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 [ 574.688641] RAX: ffffffffffffffda RBX: 0000559e8860e190 RCX: 00007fba0761b555 [ 574.774027] RDX: 0000000000002000 RSI: 0000559e8860e740 RDI: 0000000000000006 [ 574.859412] RBP: 0000000000000d68 R08: 0000559e88610740 R09: 0000000000000003 [ 574.944798] R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000002000 [ 575.030181] R13: 0000559e88610750 R14: 0000000000000000 R15: 0000000000000000 [ 575.115568] Modules linked in: joydev sch_fq binfmt_misc overlay 8021q garp mrp stp llc rpcrdma intel_rapl_msr intel_rapl_common sunrpc rdma_ucm ib_srpt ib_isert isst_if_common iscsi_target_mod target_core_mod ib_iser libiscsi scsi_transport_iscsi bonding skx_edac rdma_cm ib_umad nfit ib_ipoib iw_cm libnvdimm x86_pkg_temp_thermal intel_powerclamp ib_cm coretemp kvm_intel kvm dell_smbios irqbypass iTCO_wdt mlx5_ib crct10 dif_pclmul crc32_pclmul iTCO_vendor_support dell_wmi_descriptor wmi_bmof ib_uverbs dcdbas ghash_clmulni_intel rapl mei_me intel_cstate i2c_i801 lpc_ich pcspkr ib_core mei intel_uncore wmi ipmi_ssif acpi_power_mete r vfat fat ip_vs ext4 mbcache jbd2 sd_mod t10_pi sg mgag200 drm_kms_helper mlx5_core syscopyarea sysfillrect sysimgblt fb_sys_fops ahci mlxfw libahci pci_hyperv_intf drm tls megaraid_sas libata psample i2c_algo_bi t openvswitch nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 libcrc32c crc32c_intel nf_defrag_ipv4 ipmi_si ipmi_devintf ipmi_msghandler fuse [ 576.150586] CR2: 0000559e8860e740 This is the affected version: [ 0.000000] Linux version 4.18.0-358.el8.x86_64 ( mockbuild@xxxxxxxxxxxxxxxxxxxxxxxx) (gcc version 8.5.0 20210514 (Red Hat 8.5.0-7) (GCC)) #1 SMP Mon Jan 10 13:11:20 UTC 2022 [ 0.000000] Command line: elfcorehdr=0x38000000 BOOT_IMAGE=(hd0,gpt2)/boot/vmlinuz-4.18.0-358.el8.x86_64 ro resume=UUID=0de0c20d-9d0a-447e-981c-898b54be9f48 console=ttyS0 irqpoll nr_cpus=1 reset_devices cgroup_ disable=memory mce=off numa=off udev.children-max=2 panic=10 rootflags=nofail acpi_no_memhotplug transparent_hugepage=never nokaslr novmcoredd hest_disable disable_cpu_apicid=0 iTCO_wdt.pretimeout=0 trace_buf_size =1 Everything went to normal going back to the previous kernel version (no need to downgrade any other package). I also verified that the 348 version of the kernel (4.18.0-348.2.1.el8_5.x86_64) works fine, so we left that one for the moment. I hope this is useful to others that could experience the same problem. Kind regards, Javier _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx