Re: The last 15 'degraded' items take as many hours as the first 15K?

"Anthony D'Atri" <anthony.datri@xxxxxxxxx> · Wed, 11 May 2022 15:07:50 -0700

Small objects recover faster than large ones.

But especially, early in the process many OSDs / PGs are recovering in parallel.  Toward the end there’s a long tail where parallelism is limited by osd_max_backfills, say the remaining PGs to recover are all on a single OSD, they will execute serially.

> 
> Might someone explain why the count of degraded items can drop thousands, sometimes tens of thousands in the same number of hours it takes to go from 10 to 0?  For example, when an OSD or a host with a few OSD's goes offline for a while, reboots.
> 
> Sitting at one complete and entire degraded object out of millions for longer than it took to write this post.
> 
> Seems the fewer the number of degraded objects, the less interested the cluster is in fixing it!
> 
> HC
> 
> 
> 
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx