Do you have any network congestion or packet loss on the replication network? are you sharing nics between public / replication? That is another metric that needs looking into. ________________________________ From: J-P Methot <jp.methot@xxxxxxxxxxxxxxxxx> Sent: 18 January 2023 12:42 To: ceph-users <ceph-users@xxxxxxx> Subject: Flapping OSDs on pacific 16.2.10 CAUTION: This email originates from outside THG Hi, We have a full SSD production cluster running on Pacific 16.2.10 and deployed with cephadm that is experiencing OSD flapping issues. Essentially, random OSDs will get kicked out of the cluster and then automatically brought back in a few times a day. As an example, let's take the case of OSD.184 : -It flapped 9 times between January 15th and 17th with the following log message each time : 2023-01-15T16:33:19.903+0000 prepare_failure osd.184 from osd.49 is reporting failure:1 -On January 17th, it complains that there are slow ops and spam its logs with the following line : heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f346aa64700' had timed out after 15.000000954s The storage node itself has over 30 GB of ram still available in cache and the drives themselves only seldom peak at 100% usage and that never lasts more than a few seconds. CPU usage is also constantly around 5%. Considering there is no other error messages in any of the regular logs, including the systemd logs, why would this OSD not reply to heartbeats? -- Jean-Philippe Méthot Senior Openstack system administrator Administrateur système Openstack sénior PlanetHoster inc. _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx Danny Webb Principal OpenStack Engineer Danny.Webb@xxxxxxxxxxxxxxx [THG Ingenuity Logo] www.thg.com<https://www.thg.com> [https://i.imgur.com/wbpVRW6.png]<https://www.linkedin.com/company/thg-ingenuity/?originalSubdomain=uk> [https://i.imgur.com/c3040tr.png] <https://twitter.com/thgingenuity?lang=en> _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx