Re: Odd 10-minute delay before recovery IO begins

Stephen Smith6 <esmith@xxxxxxx> · Mon, 5 Dec 2022 22:29:31 +0000

The 10 minute delay is the default wait period Ceph allows before it attempts to heal the data. See "mon_osd_report_timeout" – I believe the default is 900 seconds.

From: Sean Matheny <sean.matheny@xxxxxxxxxxx>
Date: Monday, December 5, 2022 at 5:20 PM
To: ceph-users@xxxxxxx <ceph-users@xxxxxxx>
Cc: Blair Bethwaite <blair.bethwaite@xxxxxxxxxxx>, piotr@xxxxxxxxxxxx <piotr@xxxxxxxxxxxx>, Michal Nasiadka <michal@xxxxxxxxxxxx>
Subject: [EXTERNAL]  Odd 10-minute delay before recovery IO begins
Hi all,

New Quincy cluster here that I'm just running through some benchmarks against:

ceph version 17.2.3 (dff484dfc9e19a9819f375586300b3b79d80034d) quincy (stable)
11 nodes of 24x 18TB HDD OSDs, 2x 2.9TB SSD OSDs

I'm seeing a delay of almost exactly 10 minutes when I remove an OSD/node from the cluster until actual recovery IO begins. This is much different behaviour that what I'm used to in Nautilus previously, where recovery IO would commence within seconds. Downed OSDs are reflected in ceph health within a few seconds (as expected), and affected PGs show as undersized a few seconds later (as expected). I guess this 10-minute delay may even be a feature-- accidentally rebooting a node before setting recovery flags would prevent rebalancing, for example. Just thought it was worth asking in case it's a bug or something to look deeper into.

I've read through the OSD config and all of my recovery tuneables look ok, for example:
https://docs.ceph.com/en/latest/rados/configuration/osd-config-ref/ 

[ceph: root@ /]# ceph config get osd osd_recovery_delay_start
20.000000
3[ceph: root@ /]# ceph config get osd osd_recovery_sleep
40.000000
5[ceph: root@ /]# ceph config get osd osd_recovery_sleep_hdd
60.100000
7[ceph: root@ /]# ceph config get osd osd_recovery_sleep_ssd
80.000000
9[ceph: root@ /]# ceph config get osd osd_recovery_sleep_hybrid
100.025000

Thanks in advance.

Ngā mihi,

Sean Matheny
HPC Cloud Platform DevOps Lead
New Zealand eScience Infrastructure (NeSI)

e: sean.matheny@xxxxxxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx