Hello, currently, we are experiencing problems with a cluster used for storing RBD backups. Config: * 8 nodes, each with 6 HDDs OSDs and 1 SSD used for blockdb and WAL * k=4 m=2 EC * dual 25GbE NIC * v14.2.8 ceph health detail shows the following messages: HEALTH_WARN BlueFS spillover detected on 1 OSD(s); 45 pgs not deep-scrubbed in time; snap trim queue for 2 pg(s) >= 32768 (mon_osd_snap_trim_queue_warn_on); 1 slow ops, oldest one blocked for 18629 sec, mon.cloud10-1517 has slow ops BLUEFS_SPILLOVER BlueFS spillover detected on 1 OSD(s) osd.0 spilled over 68 MiB metadata from 'db' device (35 GiB used of 185 GiB) to slow device PG_NOT_DEEP_SCRUBBED 45 pgs not deep-scrubbed in time pg 18.3f5 not deep-scrubbed since 2020-09-03 21:58:28.316958 pg 18.3ed not deep-scrubbed since 2020-09-01 15:11:54.335935 [--- cut ---] PG_SLOW_SNAP_TRIMMING snap trim queue for 2 pg(s) >= 32768 (mon_osd_snap_trim_queue_warn_on) snap trim queue for pg 18.2c5 at 41630 snap trim queue for pg 18.d6 at 44079 longest queue on pg 18.d6 at 44079 try decreasing "osd snap trim sleep" and/or increasing "osd pg max concurrent snap trims". SLOW_OPS 1 slow ops, oldest one blocked for 18629 sec, mon.cloud10-1517 has slow ops We've made some observations on that cluster: * The BlueFS spillover goes away with "ceph tell osd.0 compact" but comes back eventually * The blockdb/WAL SSD is highly utilized, while the HDDs are not * When one OSD fails, there is a cascade failure taking down many other OSDs across all nodes. Most of the time, the cluster comes back when setting the nodown flag and restarting all failed OSDs one by one * Sometimes, especially during maintenance, "Long heartbeat ping times on front/back interface seen, longest is 1390.076 msec" messages pop up * The cluster performance deteriorates sharply when upgrading from 14.2.8 to 14.2.11 or later, so we've rolled back to 14.2.8 Of these problems, the OSD cascade failure is the most important, and is responsible for lenghty downtimes in the past few weeks. Do you have any ideas on how to combat these problems? Thank you, Paul -- Mit freundlichen Grüßen Paul Kramme Ihr Profihost Team ------------------------------- Profihost AG Expo Plaza 1 30539 Hannover Deutschland Tel.: +49 (511) 5151 8181 | Fax.: +49 (511) 5151 8282 URL: http://www.profihost.com | E-Mail: info@xxxxxxxxxxxxx Sitz der Gesellschaft: Hannover, USt-IdNr. DE813460827 Registergericht: Amtsgericht Hannover, Register-Nr.: HRB 202350 Vorstand: Cristoph Bluhm, Stefan Priebe, Marc Zocher, Dr. Claus Boyens, Daniel Hagemeier Aufsichtsrat: Gabriele Pulvermüller (Vorsitzende) _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx