Re: Odd 10-minute delay before recovery IO begins

Wesley Dillingham <wes@xxxxxxxxxxxxxxxxx> · Mon, 5 Dec 2022 21:09:57 -0500

I think you are experiencing the mon_osd_down_out_interval

https://docs.ceph.com/en/latest/rados/configuration/mon-osd-interaction/#confval-mon_osd_down_out_interval

Ceph waits 10 minutes before marking a down osd as out for the reasons you
mention, but this would have been the case in nautilus as well.

Respectfully,

*Wes Dillingham*
wes@xxxxxxxxxxxxxxxxx
LinkedIn <http://www.linkedin.com/in/wesleydillingham>

On Mon, Dec 5, 2022 at 5:20 PM Sean Matheny <sean.matheny@xxxxxxxxxxx>
wrote:

> Hi all,
>
> New Quincy cluster here that I'm just running through some benchmarks
> against:
>
> ceph version 17.2.3 (dff484dfc9e19a9819f375586300b3b79d80034d) quincy
> (stable)
> 11 nodes of 24x 18TB HDD OSDs, 2x 2.9TB SSD OSDs
>
> I'm seeing a delay of almost exactly 10 minutes when I remove an OSD/node
> from the cluster until actual recovery IO begins. This is much different
> behaviour that what I'm used to in Nautilus previously, where recovery IO
> would commence within seconds. Downed OSDs are reflected in ceph health
> within a few seconds (as expected), and affected PGs show as undersized a
> few seconds later (as expected). I guess this 10-minute delay may even be a
> feature-- accidentally rebooting a node before setting recovery flags would
> prevent rebalancing, for example. Just thought it was worth asking in case
> it's a bug or something to look deeper into.
>
> I've read through the OSD config and all of my recovery tuneables look ok,
> for example:
> https://docs.ceph.com/en/latest/rados/configuration/osd-config-ref/;
>
> [ceph: root@ /]# ceph config get osd osd_recovery_delay_start
> 20.000000
> 3[ceph: root@ /]# ceph config get osd osd_recovery_sleep
> 40.000000
> 5[ceph: root@ /]# ceph config get osd osd_recovery_sleep_hdd
> 60.100000
> 7[ceph: root@ /]# ceph config get osd osd_recovery_sleep_ssd
> 80.000000
> 9[ceph: root@ /]# ceph config get osd osd_recovery_sleep_hybrid
> 100.025000
>
> Thanks in advance.
>
> Ngā mihi,
>
> Sean Matheny
> HPC Cloud Platform DevOps Lead
> New Zealand eScience Infrastructure (NeSI)
>
> e: sean.matheny@xxxxxxxxxxx
>
>
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx