libceph in kernel stack trace prior to ceph client's crash

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi list,

We have some ceph clients that would reboot intermittently. We always see this stack dump ​from dmesg prior to the hosts rebooting:

 Jan 10 06:52:33 xxxxxhostnamexxxxx kernel: [38386170.332063] ------------[ cut here ]------------
 Jan 10 06:52:33 xxxxxhostnamexxxxx kernel: [38386170.332067] WARNING: CPU: 11 PID: 229190 at net/ceph/osd_client.c:497 request_reinit+0x140/0x180 [libceph]May 10 06:52:33 s-jn4vh63.sys.az1.cust.ash.wd kernel: [38386170.332067] Modules linked in: joydev rbd libceph dns_resolver dell_rbu udp_diag unix_diag af_packet_diag netlink_diag nfsv3 nfs_acl nfs lockd grace fscache tcp_diag inet_diag uas usb_storage binfmt_misc nf_conntrack_netlink ip6table_mangle ip6table_raw xt_NFLOG xt_u32 nf_conntrack_ipv6 nf_defrag_ipv6 xt_LOG nf_conntrack_tftp nf_conntrack_ftp iptable_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat iptable_mangle xt_set xt_multiport xt_conntrack nf_conntrack ip_set_hash_netport ip_set_hash_ipport ip_set_hash_net ip_set_hash_ip nfnetlink_log ip_set nfnetlink ip6table_filter ip6_tables iptable_filter mpt3sas mpt2sas raid_class scsi_transport_sas mptctl mptbase drbg ansi_cprng dm_crypt loop bonding sunrpc vfat fat dm_mod skx_edac intel_powerclamp coretemp intel_rapl iosf_mbi kvm dell_smbios iTCO_wdt iTCO_vendor_support dell_wmi_descriptor irqbypass crc32_pclmul ghash_clmulni_intel dcdbas aesni_intel lrw gf128mul glue_helper ablk_helper cryptd pcspkr sg mgag200 i2c_algo_bit ttm drm_kms_helper syscopyarea sysfillrect ixgbe sysimgblt fb_sys_fops drm ptp pps_core mdio dca drm_panel_orientation_quirks i2c_i801 mei_me lpc_ich mei wmi ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter acpi_pad ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic ahci libahci libata crct10dif_pclmul crct10dif_common crc32c_intel megaraid_sas nfit libnvdimm [last unloaded: dell_rbu]
 Jan 10 06:52:33 xxxxxhostnamexxxxx kernel: [38386170.332094] CPU: 11 PID: 229190 Comm: kworker/11:1 Tainted: G W ------------ 3.10.0-1127.10.1.el7.x86_64 #1
 Jan 10 06:52:33 xxxxxhostnamexxxxx kernel: [38386170.332095] Hardware name: Dell Inc. PowerEdge R740xd/XXXX0, BIOS 2.8.2 08/27/2020
 Jan 10 06:52:33 xxxxxhostnamexxxxx kernel: [38386170.332098] Workqueue: events handle_timeout [libceph]
 Jan 10 06:52:33 xxxxxhostnamexxxxx kernel: [38386170.332099] Call Trace:
 Jan 10 06:52:33 xxxxxhostnamexxxxx kernel: [38386170.332100] [<ffffffffb957ffa5>] dump_stack+0x19/0x1b
 Jan 10 06:52:33 xxxxxhostnamexxxxx kernel: [38386170.332102] [<ffffffffb8e9bd18>] __warn+0xd8/0x100
 Jan 10 06:52:33 xxxxxhostnamexxxxx kernel: [38386170.332103] [<ffffffffb8e9be5d>] warn_slowpath_null+0x1d/0x20
 Jan 10 06:52:33 xxxxxhostnamexxxxx kernel: [38386170.332106] [<ffffffffc0bec600>] request_reinit+0x140/0x180 [libceph]
 Jan 10 06:52:33 xxxxxhostnamexxxxx kernel: [38386170.332110] [<ffffffffc0bf357a>] handle_timeout+0x3aa/0x770 [libceph]
 Jan 10 06:52:33 xxxxxhostnamexxxxx kernel: [38386170.332111] [<ffffffffb8ebe6bf>] process_one_work+0x17f/0x440
 Jan 10 06:52:33 xxxxxhostnamexxxxx kernel: [38386170.332113] [<ffffffffb8ebf7d6>] worker_thread+0x126/0x3c0
 Jan 10 06:52:33 xxxxxhostnamexxxxx kernel: [38386170.332114] [<ffffffffb8ebf6b0>] ? manage_workers.isra.26+0x2a0/0x2a0
 Jan 10 06:52:33 xxxxxhostnamexxxxx kernel: [38386170.332115] [<ffffffffb8ec6691>] kthread+0xd1/0xe0
 Jan 10 06:52:33 xxxxxhostnamexxxxx kernel: [38386170.332117] [<ffffffffb8ec65c0>] ? insert_kthread_work+0x40/0x40
 Jan 10 06:52:33 xxxxxhostnamexxxxx kernel: [38386170.332118] [<ffffffffb9592d1d>] ret_from_fork_nospec_begin+0x7/0x21
 Jan 10 06:52:33 xxxxxhostnamexxxxx kernel: [38386170.332119] [<ffffffffb8ec65c0>] ? insert_kthread_work+0x40/0x40
 Jan 10 06:52:33 xxxxxhostnamexxxxx kernel: [38386170.332120] ---[ end trace 5aeee0f10a265d18 ]---


[root@xxxxxhostnamexxxxx ~]# ceph --version
ceph version 14.2.22 (ca74598065096e6fcbd8433c8779a2be0c889351) nautilus (stable)
[root@xxxxxhostnamexxxxx ~]# uname -a
Linux xxxxxhostnamexxxxx 3.10.0-1160.25.1.el7.x86_64 #1 SMP Wed Apr 28 21:49:45 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

(Apologies for the formatting)

Any suggestions on how we should go about troubleshooting this?

Thank you in advance.

Jkr
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux