VFS: Busy inodes after unmount of ceph lead to kernel panic (maybe?)

Christian Kugler <syphdias+ceph@xxxxxxxxx> · Thu, 19 Sep 2024 14:05:44 +0200

Hi,

we had a problem with a Ceph OSD stuck in snaptrim. We set the Ceph OSD
down temporarily to potentially solve the issue. This ended up kernel
panicking two of our Kubernetes workers that use this Ceph cluster for RBD
and CephFS directly and via CSI driver.

The more notable lines were these after "osd25 down" and "osd25 up".
Especially VFS busy inodes and the mention of generic_shutdown_super which
seems to be dealing with inodes seem suspicious to be to be the culprit.

2024-09-16T16:18:37.687175+02:00 HOSTNAME kernel: [6913636.547576] VFS:
Busy inodes after unmount of ceph (ceph)
2024-09-16T16:18:37.687191+02:00 HOSTNAME kernel: [6913636.547588]
------------[ cut here ]------------
2024-09-16T16:18:37.687196+02:00 HOSTNAME kernel: [6913636.548482] kernel
BUG at fs/super.c:503!
[…snip…]
2024-09-16T16:18:37.731920+02:00 HOSTNAME kernel: [6913636.562214]  ?
generic_shutdown_super.cold+0x1a/0x1c

Log from "libceph (…): osd25 down" to reboot as attachment including some
null bytes that snuck in somehow.

Has anybody run into this before or has an idea how to prevent this in the
future?

We are currently running Reef (18.2.2, updating soon).

Best,
Christian
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx