Re: Odd 10-minute delay before recovery IO begins

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Sounds like your OSDs were down, but not marked out. Recovery will only
occur once they are actually marked out. The default
mon_osd_down_out_interval is 10 minutes.

You can mark them out explicitly with ceph osd out <id>

On Mon, Dec 5, 2022 at 2:20 PM Sean Matheny <sean.matheny@xxxxxxxxxxx>
wrote:

> Hi all,
>
> New Quincy cluster here that I'm just running through some benchmarks
> against:
>
> ceph version 17.2.3 (dff484dfc9e19a9819f375586300b3b79d80034d) quincy
> (stable)
> 11 nodes of 24x 18TB HDD OSDs, 2x 2.9TB SSD OSDs
>
> I'm seeing a delay of almost exactly 10 minutes when I remove an OSD/node
> from the cluster until actual recovery IO begins. This is much different
> behaviour that what I'm used to in Nautilus previously, where recovery IO
> would commence within seconds. Downed OSDs are reflected in ceph health
> within a few seconds (as expected), and affected PGs show as undersized a
> few seconds later (as expected). I guess this 10-minute delay may even be a
> feature-- accidentally rebooting a node before setting recovery flags would
> prevent rebalancing, for example. Just thought it was worth asking in case
> it's a bug or something to look deeper into.
>
> I've read through the OSD config and all of my recovery tuneables look ok,
> for example:
> https://docs.ceph.com/en/latest/rados/configuration/osd-config-ref/;
>
> [ceph: root@ /]# ceph config get osd osd_recovery_delay_start
> 20.000000
> 3[ceph: root@ /]# ceph config get osd osd_recovery_sleep
> 40.000000
> 5[ceph: root@ /]# ceph config get osd osd_recovery_sleep_hdd
> 60.100000
> 7[ceph: root@ /]# ceph config get osd osd_recovery_sleep_ssd
> 80.000000
> 9[ceph: root@ /]# ceph config get osd osd_recovery_sleep_hybrid
> 100.025000
>
> Thanks in advance.
>
> Ngā mihi,
>
> Sean Matheny
> HPC Cloud Platform DevOps Lead
> New Zealand eScience Infrastructure (NeSI)
>
> e: sean.matheny@xxxxxxxxxxx
>
>
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>


-- 
Tyler Brekke
Senior Engineer I
tbrekke@xxxxxxxxxxxxxxxx
------------------------------
We're Hiring! <https://do.co/careers> | @digitalocean
<https://twitter.com/digitalocean> | YouTube
<https://www.youtube.com/digitalocean>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux