Hi everyone, I would like to ask for help regarding client kernel crashes that happen on cephfs access. We have been struggling with this for over a month now with over 100 crashes on 7 hosts during that time. Our cluster runs version 18.2.1. Our clients run CentOS Stream. On CentOS Stream 9 the problem started with kernel version 5.14.0-425.el9. Version 5.14.0-419.el9 is the last one without problems. It also occurred on CentOS Stream 8, starting with version 4.18.0-546.el8 (4.18.0-544.el8 being the last good one). The problem presents itself by the client kernel crashing, forcing a reboot of the machine. Apparently it is triggered by a certain level of IO on the cephfs mount. It works perfectly fine when we rollback to the last good kernel version. The exact call trace in vmcore-dmesg.txt differs between occurrences. Here are two typical examples: ``` [ 8641.382499] list_del corruption. next->prev should be ffff88bd0a4d4c80, but was ffff88bcefdfd280 [ 8641.382521] ------------[ cut here ]------------ [ 8641.382521] kernel BUG at lib/list_debug.c:54! [ 8641.382528] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI [ 8641.382591] CPU: 2 PID: 83929 Comm: kworker/2:0 Kdump: loaded Not tainted 5.14.0-432.el9.x86_64 #1 [ 8641.382610] Hardware name: oVirt RHEL/RHEL-AV, BIOS edk2-20230524-4.el9_3 05/24/2023 [ 8641.382624] Workqueue: ceph-cap ceph_cap_unlink_work [ceph] [ 8641.382662] RIP: 0010:__list_del_entry_valid.cold+0x1d/0x47 [ 8641.382681] Code: c7 c7 78 42 d8 b1 e8 f9 87 fe ff 0f 0b 48 89 fe 48 c7 c7 08 43 d8 b1 e8 e8 87 fe ff 0f 0b 48 c7 c7 b8 43 d8 b1 e8 da 87 fe ff <0f> 0b 48 89 f2 48 89 fe 48 c7 c7 78 43 d8 b1 e8 c6 87 fe ff 0f 0b [ 8641.382711] RSP: 0018:ffff95a000d6be60 EFLAGS: 00010246 [ 8641.382722] RAX: 0000000000000054 RBX: ffff88bced76dc00 RCX: 0000000000000000 [ 8641.382734] RDX: 0000000000000000 RSI: ffff88c02eea0840 RDI: ffff88c02eea0840 [ 8641.382746] RBP: ffff88bd0a4d4c80 R08: 80000000ffff8434 R09: 0000000000ffff10 [ 8641.382758] R10: 000000000000000f R11: 000000000000000f R12: ffff88c02eeb2800 [ 8641.382779] R13: ffff88bcc4610258 R14: ffff88bcc46101b8 R15: ffff88bcc46101c8 [ 8641.382793] FS: 0000000000000000(0000) GS:ffff88c02ee80000(0000) knlGS:0000000000000000 [ 8641.382809] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 8641.382819] CR2: 00007f35cee8a000 CR3: 0000000105708004 CR4: 00000000007706e0 [ 8641.382832] PKRU: 55555554 [ 8641.382838] Call Trace: [ 8641.382844] <TASK> [ 8641.382850] ? show_trace_log_lvl+0x1c4/0x2df [ 8641.382860] ? show_trace_log_lvl+0x1c4/0x2df [ 8641.382870] ? ceph_cap_unlink_work+0x3f/0x140 [ceph] [ 8641.382893] ? __die_body.cold+0x8/0xd [ 8641.382902] ? die+0x2b/0x50 [ 8641.382911] ? do_trap+0xce/0x120 [ 8641.382919] ? __list_del_entry_valid.cold+0x1d/0x47 [ 8641.382930] ? do_error_trap+0x65/0x80 [ 8641.382938] ? __list_del_entry_valid.cold+0x1d/0x47 [ 8641.382948] ? exc_invalid_op+0x4e/0x70 [ 8641.382958] ? __list_del_entry_valid.cold+0x1d/0x47 [ 8641.382975] ? asm_exc_invalid_op+0x16/0x20 [ 8641.382988] ? __list_del_entry_valid.cold+0x1d/0x47 [ 8641.382998] ceph_cap_unlink_work+0x3f/0x140 [ceph] [ 8641.383021] process_one_work+0x1e2/0x3b0 [ 8641.383032] ? __pfx_worker_thread+0x10/0x10 [ 8641.383043] worker_thread+0x50/0x3a0 [ 8641.383051] ? __pfx_worker_thread+0x10/0x10 [ 8641.383061] kthread+0xdd/0x100 [ 8641.383069] ? __pfx_kthread+0x10/0x10 [ 8641.383078] ret_from_fork+0x29/0x50 [ 8641.383090] </TASK> [ 8641.383095] Modules linked in: tls ceph libceph dns_resolver fscache netfs nft_counter ipt_REJECT xt_owner xt_conntrack nft_compat nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 rfkill ip_set nf_tables libcrc32c nfnetlink vfat fat intel_rapl_msr intel_rapl_common intel_uncore_frequency_common isst_if_common nfit virtio_gpu iTCO_wdt iTCO_vendor_support libnvdimm lpc_ich virtio_dma_buf drm_shmem_helper drm_kms_helper i2c_i801 rapl syscopyarea sysfillrect sysimgblt virtio_balloon fb_sys_fops i2c_smbus pcspkr joydev fuse drm ext4 mbcache jbd2 sr_mod cdrom sd_mod ahci t10_pi sg libahci crct10dif_pclmul crc32_pclmul crc32c_intel libata ghash_clmulni_intel virtio_net virtio_console virtio_scsi net_failover failover serio_raw ``` ``` [ 3538.365469] list_del corruption. next->prev should be ffff8d2b75997c80, but was ffff8d2afcfaae80 [ 3538.365488] ------------[ cut here ]------------ [ 3538.365488] kernel BUG at lib/list_debug.c:54! [ 3538.365493] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI [ 3538.365553] CPU: 0 PID: 910 Comm: php-fpm Kdump: loaded Not tainted 5.14.0-432.el9.x86_64 #1 [ 3538.365569] Hardware name: oVirt RHEL/RHEL-AV, BIOS edk2-20230524-4.el9_3 05/24/2023 [ 3538.365582] RIP: 0010:__list_del_entry_valid.cold+0x1d/0x47 [ 3538.365612] Code: c7 c7 78 42 38 8e e8 f9 87 fe ff 0f 0b 48 89 fe 48 c7 c7 08 43 38 8e e8 e8 87 fe ff 0f 0b 48 c7 c7 b8 43 38 8e e8 da 87 fe ff <0f> 0b 48 89 f2 48 89 fe 48 c7 c7 78 43 38 8e e8 c6 87 fe ff 0f 0b [ 3538.365641] RSP: 0018:ffffae870073fda0 EFLAGS: 00010246 [ 3538.365652] RAX: 0000000000000054 RBX: ffff8d2b75997800 RCX: 0000000000000000 [ 3538.365668] RDX: 0000000000000000 RSI: ffff8d2e2ee20840 RDI: ffff8d2e2ee20840 [ 3538.365681] RBP: ffff8d2b75997ab8 R08: 80000000ffff842f R09: 0000000000ffff10 [ 3538.365693] R10: 000000000000000f R11: 000000000000000f R12: 00000000ffffc032 [ 3538.365705] R13: ffff8d2b75997c80 R14: ffff8d2ac480b800 R15: ffff8d2ac480b9c8 [ 3538.365717] FS: 00007f9be42097c0(0000) GS:ffff8d2e2ee00000(0000) knlGS:0000000000000000 [ 3538.365733] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 3538.365744] CR2: 00007fa97dc5398c CR3: 0000000104248004 CR4: 00000000007706f0 [ 3538.365756] PKRU: 55555554 [ 3538.365761] Call Trace: [ 3538.365768] <TASK> [ 3538.365774] ? show_trace_log_lvl+0x1c4/0x2df [ 3538.365785] ? show_trace_log_lvl+0x1c4/0x2df [ 3538.365796] ? ceph_drop_caps_for_unlink+0xb8/0x170 [ceph] [ 3538.365828] ? __die_body.cold+0x8/0xd [ 3538.365836] ? die+0x2b/0x50 [ 3538.365845] ? do_trap+0xce/0x120 [ 3538.365853] ? __list_del_entry_valid.cold+0x1d/0x47 [ 3538.365863] ? do_error_trap+0x65/0x80 [ 3538.365871] ? __list_del_entry_valid.cold+0x1d/0x47 [ 3538.365881] ? exc_invalid_op+0x4e/0x70 [ 3538.365891] ? __list_del_entry_valid.cold+0x1d/0x47 [ 3538.365901] ? asm_exc_invalid_op+0x16/0x20 [ 3538.365912] ? __list_del_entry_valid.cold+0x1d/0x47 [ 3538.365923] ceph_drop_caps_for_unlink+0xb8/0x170 [ceph] [ 3538.365947] ceph_unlink+0xed/0x450 [ceph] [ 3538.365970] vfs_unlink+0x114/0x290 [ 3538.365980] do_unlinkat+0x1af/0x2e0 [ 3538.365990] __x64_sys_unlink+0x3e/0x60 [ 3538.365999] do_syscall_64+0x59/0x90 [ 3538.366008] ? syscall_exit_to_user_mode+0x22/0x40 [ 3538.366018] ? do_syscall_64+0x69/0x90 [ 3538.366027] ? do_syscall_64+0x69/0x90 [ 3538.366035] entry_SYSCALL_64_after_hwframe+0x72/0xdc [ 3538.366046] RIP: 0033:0x7f9be40ff27b [ 3538.366069] Code: f0 ff ff 73 01 c3 48 8b 0d a2 ab 0f 00 f7 d8 64 89 01 48 83 c8 ff c3 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 57 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 75 ab 0f 00 f7 d8 64 89 01 48 [ 3538.367031] RSP: 002b:00007ffd8640de58 EFLAGS: 00000246 ORIG_RAX: 0000000000000057 [ 3538.367576] RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 00007f9be40ff27b [ 3538.368116] RDX: 0000000000000007 RSI: 0000000000000001 RDI: 00007f9bdd4af698 [ 3538.368646] RBP: 00007f9bdd4af698 R08: 00000000ffffffc9 R09: 0000000000000038 [ 3538.369156] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 [ 3538.369671] R13: 00007f9bdd4af698 R14: 0000000000000001 R15: 00007f9be3c15290 [ 3538.370182] </TASK> [ 3538.370682] Modules linked in: ceph libceph dns_resolver fscache netfs nft_counter ipt_REJECT xt_owner xt_conntrack nft_compat nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 rfkill ip_set nf_tables libcrc32c nfnetlink vfat fat intel_rapl_msr intel_rapl_common intel_uncore_frequency_common virtio_gpu virtio_dma_buf drm_shmem_helper isst_if_common drm_kms_helper nfit syscopyarea sysfillrect sysimgblt fb_sys_fops libnvdimm i2c_i801 iTCO_wdt iTCO_vendor_support lpc_ich i2c_smbus virtio_balloon rapl joydev pcspkr drm fuse ext4 mbcache jbd2 sr_mod cdrom sg ahci libahci crct10dif_pclmul crc32_pclmul crc32c_intel libata ghash_clmulni_intel virtio_net virtio_blk virtio_console net_failover virtio_scsi failover serio_raw ``` I checked the changelogs of the kernel versions and spotted these three commits that were backported: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=dbc347ef7f0c53aa4a5383238a804d7ebbb0b5ca https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=902d6d013f75b68f31d208c6f3ff9cdca82648a7 https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=07045648c07c5632e0dfd5ce084d3cd0cec0258a The first one adds changes that look related. Does anybody have experienced this as well or know something about this? Thanks and best regards, Marc
Attachment:
smime.p7s
Description: Kryptografische S/MIME-Signatur
_______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx