Sounds like your OSDs were down, but not marked out. Recovery will only occur once they are actually marked out. The default mon_osd_down_out_interval is 10 minutes. You can mark them out explicitly with ceph osd out <id> On Mon, Dec 5, 2022 at 2:20 PM Sean Matheny <sean.matheny@xxxxxxxxxxx> wrote: > Hi all, > > New Quincy cluster here that I'm just running through some benchmarks > against: > > ceph version 17.2.3 (dff484dfc9e19a9819f375586300b3b79d80034d) quincy > (stable) > 11 nodes of 24x 18TB HDD OSDs, 2x 2.9TB SSD OSDs > > I'm seeing a delay of almost exactly 10 minutes when I remove an OSD/node > from the cluster until actual recovery IO begins. This is much different > behaviour that what I'm used to in Nautilus previously, where recovery IO > would commence within seconds. Downed OSDs are reflected in ceph health > within a few seconds (as expected), and affected PGs show as undersized a > few seconds later (as expected). I guess this 10-minute delay may even be a > feature-- accidentally rebooting a node before setting recovery flags would > prevent rebalancing, for example. Just thought it was worth asking in case > it's a bug or something to look deeper into. > > I've read through the OSD config and all of my recovery tuneables look ok, > for example: > https://docs.ceph.com/en/latest/rados/configuration/osd-config-ref/; > > [ceph: root@ /]# ceph config get osd osd_recovery_delay_start > 20.000000 > 3[ceph: root@ /]# ceph config get osd osd_recovery_sleep > 40.000000 > 5[ceph: root@ /]# ceph config get osd osd_recovery_sleep_hdd > 60.100000 > 7[ceph: root@ /]# ceph config get osd osd_recovery_sleep_ssd > 80.000000 > 9[ceph: root@ /]# ceph config get osd osd_recovery_sleep_hybrid > 100.025000 > > Thanks in advance. > > Ngā mihi, > > Sean Matheny > HPC Cloud Platform DevOps Lead > New Zealand eScience Infrastructure (NeSI) > > e: sean.matheny@xxxxxxxxxxx > > > > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > -- Tyler Brekke Senior Engineer I tbrekke@xxxxxxxxxxxxxxxx ------------------------------ We're Hiring! <https://do.co/careers> | @digitalocean <https://twitter.com/digitalocean> | YouTube <https://www.youtube.com/digitalocean> _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx