Re: Client kernel crashes on cephfs access

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Marc,

Thanks for reporting this, I generated one patch to fix it. Will send it out after testing is done.

- Xiubo

On 4/8/24 16:01, Marc Ruhmann wrote:
Hi everyone,

I would like to ask for help regarding client kernel crashes that happen
on cephfs access. We have been struggling with this for over a month now
with over 100 crashes on 7 hosts during that time.

Our cluster runs version 18.2.1. Our clients run CentOS Stream.

On CentOS Stream 9 the problem started with kernel version
5.14.0-425.el9. Version 5.14.0-419.el9 is the last one without problems.
It also occurred on CentOS Stream 8, starting with version
4.18.0-546.el8 (4.18.0-544.el8 being the last good one).

The problem presents itself by the client kernel crashing, forcing a
reboot of the machine. Apparently it is triggered by a certain level of
IO on the cephfs mount. It works perfectly fine when we rollback to the
last good kernel version.

The exact call trace in vmcore-dmesg.txt differs between occurrences.
Here are two typical examples:

```
[ 8641.382499] list_del corruption. next->prev should be ffff88bd0a4d4c80, but was ffff88bcefdfd280
[ 8641.382521] ------------[ cut here ]------------
[ 8641.382521] kernel BUG at lib/list_debug.c:54!
[ 8641.382528] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
[ 8641.382591] CPU: 2 PID: 83929 Comm: kworker/2:0 Kdump: loaded Not tainted 5.14.0-432.el9.x86_64 #1 [ 8641.382610] Hardware name: oVirt RHEL/RHEL-AV, BIOS edk2-20230524-4.el9_3 05/24/2023
[ 8641.382624] Workqueue: ceph-cap ceph_cap_unlink_work [ceph]
[ 8641.382662] RIP: 0010:__list_del_entry_valid.cold+0x1d/0x47
[ 8641.382681] Code: c7 c7 78 42 d8 b1 e8 f9 87 fe ff 0f 0b 48 89 fe 48 c7 c7 08 43 d8 b1 e8 e8 87 fe ff 0f 0b 48 c7 c7 b8 43 d8 b1 e8 da 87 fe ff <0f> 0b 48 89 f2 48 89 fe 48 c7 c7 78 43 d8 b1 e8 c6 87 fe ff 0f 0b
[ 8641.382711] RSP: 0018:ffff95a000d6be60 EFLAGS: 00010246
[ 8641.382722] RAX: 0000000000000054 RBX: ffff88bced76dc00 RCX: 0000000000000000 [ 8641.382734] RDX: 0000000000000000 RSI: ffff88c02eea0840 RDI: ffff88c02eea0840 [ 8641.382746] RBP: ffff88bd0a4d4c80 R08: 80000000ffff8434 R09: 0000000000ffff10 [ 8641.382758] R10: 000000000000000f R11: 000000000000000f R12: ffff88c02eeb2800 [ 8641.382779] R13: ffff88bcc4610258 R14: ffff88bcc46101b8 R15: ffff88bcc46101c8 [ 8641.382793] FS:  0000000000000000(0000) GS:ffff88c02ee80000(0000) knlGS:0000000000000000
[ 8641.382809] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 8641.382819] CR2: 00007f35cee8a000 CR3: 0000000105708004 CR4: 00000000007706e0
[ 8641.382832] PKRU: 55555554
[ 8641.382838] Call Trace:
[ 8641.382844]  <TASK>
[ 8641.382850]  ? show_trace_log_lvl+0x1c4/0x2df
[ 8641.382860]  ? show_trace_log_lvl+0x1c4/0x2df
[ 8641.382870]  ? ceph_cap_unlink_work+0x3f/0x140 [ceph]
[ 8641.382893]  ? __die_body.cold+0x8/0xd
[ 8641.382902]  ? die+0x2b/0x50
[ 8641.382911]  ? do_trap+0xce/0x120
[ 8641.382919]  ? __list_del_entry_valid.cold+0x1d/0x47
[ 8641.382930]  ? do_error_trap+0x65/0x80
[ 8641.382938]  ? __list_del_entry_valid.cold+0x1d/0x47
[ 8641.382948]  ? exc_invalid_op+0x4e/0x70
[ 8641.382958]  ? __list_del_entry_valid.cold+0x1d/0x47
[ 8641.382975]  ? asm_exc_invalid_op+0x16/0x20
[ 8641.382988]  ? __list_del_entry_valid.cold+0x1d/0x47
[ 8641.382998]  ceph_cap_unlink_work+0x3f/0x140 [ceph]
[ 8641.383021]  process_one_work+0x1e2/0x3b0
[ 8641.383032]  ? __pfx_worker_thread+0x10/0x10
[ 8641.383043]  worker_thread+0x50/0x3a0
[ 8641.383051]  ? __pfx_worker_thread+0x10/0x10
[ 8641.383061]  kthread+0xdd/0x100
[ 8641.383069]  ? __pfx_kthread+0x10/0x10
[ 8641.383078]  ret_from_fork+0x29/0x50
[ 8641.383090]  </TASK>
[ 8641.383095] Modules linked in: tls ceph libceph dns_resolver fscache netfs nft_counter ipt_REJECT xt_owner xt_conntrack nft_compat nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 rfkill ip_set nf_tables libcrc32c nfnetlink vfat fat intel_rapl_msr intel_rapl_common intel_uncore_frequency_common isst_if_common nfit virtio_gpu iTCO_wdt iTCO_vendor_support libnvdimm lpc_ich virtio_dma_buf drm_shmem_helper drm_kms_helper i2c_i801 rapl syscopyarea sysfillrect sysimgblt virtio_balloon fb_sys_fops i2c_smbus pcspkr joydev fuse drm ext4 mbcache jbd2 sr_mod cdrom sd_mod ahci t10_pi sg libahci crct10dif_pclmul crc32_pclmul crc32c_intel libata ghash_clmulni_intel virtio_net virtio_console virtio_scsi net_failover failover serio_raw
```

```
[ 3538.365469] list_del corruption. next->prev should be ffff8d2b75997c80, but was ffff8d2afcfaae80
[ 3538.365488] ------------[ cut here ]------------
[ 3538.365488] kernel BUG at lib/list_debug.c:54!
[ 3538.365493] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
[ 3538.365553] CPU: 0 PID: 910 Comm: php-fpm Kdump: loaded Not tainted 5.14.0-432.el9.x86_64 #1 [ 3538.365569] Hardware name: oVirt RHEL/RHEL-AV, BIOS edk2-20230524-4.el9_3 05/24/2023
[ 3538.365582] RIP: 0010:__list_del_entry_valid.cold+0x1d/0x47
[ 3538.365612] Code: c7 c7 78 42 38 8e e8 f9 87 fe ff 0f 0b 48 89 fe 48 c7 c7 08 43 38 8e e8 e8 87 fe ff 0f 0b 48 c7 c7 b8 43 38 8e e8 da 87 fe ff <0f> 0b 48 89 f2 48 89 fe 48 c7 c7 78 43 38 8e e8 c6 87 fe ff 0f 0b
[ 3538.365641] RSP: 0018:ffffae870073fda0 EFLAGS: 00010246
[ 3538.365652] RAX: 0000000000000054 RBX: ffff8d2b75997800 RCX: 0000000000000000 [ 3538.365668] RDX: 0000000000000000 RSI: ffff8d2e2ee20840 RDI: ffff8d2e2ee20840 [ 3538.365681] RBP: ffff8d2b75997ab8 R08: 80000000ffff842f R09: 0000000000ffff10 [ 3538.365693] R10: 000000000000000f R11: 000000000000000f R12: 00000000ffffc032 [ 3538.365705] R13: ffff8d2b75997c80 R14: ffff8d2ac480b800 R15: ffff8d2ac480b9c8 [ 3538.365717] FS:  00007f9be42097c0(0000) GS:ffff8d2e2ee00000(0000) knlGS:0000000000000000
[ 3538.365733] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3538.365744] CR2: 00007fa97dc5398c CR3: 0000000104248004 CR4: 00000000007706f0
[ 3538.365756] PKRU: 55555554
[ 3538.365761] Call Trace:
[ 3538.365768]  <TASK>
[ 3538.365774]  ? show_trace_log_lvl+0x1c4/0x2df
[ 3538.365785]  ? show_trace_log_lvl+0x1c4/0x2df
[ 3538.365796]  ? ceph_drop_caps_for_unlink+0xb8/0x170 [ceph]
[ 3538.365828]  ? __die_body.cold+0x8/0xd
[ 3538.365836]  ? die+0x2b/0x50
[ 3538.365845]  ? do_trap+0xce/0x120
[ 3538.365853]  ? __list_del_entry_valid.cold+0x1d/0x47
[ 3538.365863]  ? do_error_trap+0x65/0x80
[ 3538.365871]  ? __list_del_entry_valid.cold+0x1d/0x47
[ 3538.365881]  ? exc_invalid_op+0x4e/0x70
[ 3538.365891]  ? __list_del_entry_valid.cold+0x1d/0x47
[ 3538.365901]  ? asm_exc_invalid_op+0x16/0x20
[ 3538.365912]  ? __list_del_entry_valid.cold+0x1d/0x47
[ 3538.365923]  ceph_drop_caps_for_unlink+0xb8/0x170 [ceph]
[ 3538.365947]  ceph_unlink+0xed/0x450 [ceph]
[ 3538.365970]  vfs_unlink+0x114/0x290
[ 3538.365980]  do_unlinkat+0x1af/0x2e0
[ 3538.365990]  __x64_sys_unlink+0x3e/0x60
[ 3538.365999]  do_syscall_64+0x59/0x90
[ 3538.366008]  ? syscall_exit_to_user_mode+0x22/0x40
[ 3538.366018]  ? do_syscall_64+0x69/0x90
[ 3538.366027]  ? do_syscall_64+0x69/0x90
[ 3538.366035]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
[ 3538.366046] RIP: 0033:0x7f9be40ff27b
[ 3538.366069] Code: f0 ff ff 73 01 c3 48 8b 0d a2 ab 0f 00 f7 d8 64 89 01 48 83 c8 ff c3 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 57 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 75 ab 0f 00 f7 d8 64 89 01 48 [ 3538.367031] RSP: 002b:00007ffd8640de58 EFLAGS: 00000246 ORIG_RAX: 0000000000000057 [ 3538.367576] RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 00007f9be40ff27b [ 3538.368116] RDX: 0000000000000007 RSI: 0000000000000001 RDI: 00007f9bdd4af698 [ 3538.368646] RBP: 00007f9bdd4af698 R08: 00000000ffffffc9 R09: 0000000000000038 [ 3538.369156] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 [ 3538.369671] R13: 00007f9bdd4af698 R14: 0000000000000001 R15: 00007f9be3c15290
[ 3538.370182]  </TASK>
[ 3538.370682] Modules linked in: ceph libceph dns_resolver fscache netfs nft_counter ipt_REJECT xt_owner xt_conntrack nft_compat nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 rfkill ip_set nf_tables libcrc32c nfnetlink vfat fat intel_rapl_msr intel_rapl_common intel_uncore_frequency_common virtio_gpu virtio_dma_buf drm_shmem_helper isst_if_common drm_kms_helper nfit syscopyarea sysfillrect sysimgblt fb_sys_fops libnvdimm i2c_i801 iTCO_wdt iTCO_vendor_support lpc_ich i2c_smbus virtio_balloon rapl joydev pcspkr drm fuse ext4 mbcache jbd2 sr_mod cdrom sg ahci libahci crct10dif_pclmul crc32_pclmul crc32c_intel libata ghash_clmulni_intel virtio_net virtio_blk virtio_console net_failover virtio_scsi failover serio_raw
```

I checked the changelogs of the kernel versions and spotted these three
commits that were backported:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=dbc347ef7f0c53aa4a5383238a804d7ebbb0b5ca https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=902d6d013f75b68f31d208c6f3ff9cdca82648a7 https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=07045648c07c5632e0dfd5ce084d3cd0cec0258a

The first one adds changes that look related.

Does anybody have experienced this as well or know something about this?

Thanks and best regards,

Marc

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux