Hi, you might suffer from the same bug we suffered: https://tracker.ceph.com/issues/53729 https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/KG35GRTN4ZIDWPLJZ5OQOKERUIQT5WQ6/#K45MJ63J37IN2HNAQXVOOT3J6NTXIHCA Basically there is a bug that prevents the removal of PGlog items. You need to update to pacific for the fix. There is also a very easy check if you MIGHT be affected: https://tracker.ceph.com/issues/53729#note-65 Am Do., 30. März 2023 um 17:02 Uhr schrieb <petersun@xxxxxxxxxxxx>: > We experienced a Ceph failure causing the system to become unresponsive > with no IOPS or throughput due to a problematic OSD process on one node. > This resulted in slow operations and no IOPS for all other OSDs in the > cluster. The incident timeline is as follows: > > Alert triggered for OSD problem. > 6 out of 12 OSDs on the node were down. > Soft restart attempted, but smartmontools process stuck while shutting > down server. > Hard restart attempted and service resumed as usual. > > Our Ceph cluster has 19 nodes, 218 OSDs, and is using version 15.2.17 > octopus (stable). > > Questions: > 1. What is Ceph's detection mechanism? Why couldn't Ceph detect the faulty > node and automatically abandon its resources? > 2. Did we miss any patches or bug fixes? > 3. Suggestions for improvements to quickly detect and avoid similar issues > in the future? > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > -- Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im groüen Saal. _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx