Re: The last 15 'degraded' items take as many hours as the first 15K?

"Harry G. Coin" <hgcoin@xxxxxxxxx> · Thu, 12 May 2022 11:02:39 -0500

On 5/12/22 02:05, Janne Johansson wrote:
Den tors 12 maj 2022 kl 00:03 skrev Harry G. Coin <hgcoin@xxxxxxxxx>:
Might someone explain why the count of degraded items can drop
thousands, sometimes tens of thousands in the same number of hours it
takes to go from 10 to 0?  For example, when an OSD or a host with a few
OSD's goes offline for a while, reboots.

Sitting at one complete and entire degraded object out of millions for
longer than it took to write this post.

Seems the fewer the number of degraded objects, the less interested the
cluster is in fixing it!
If (which is likely) different PGs take a different amount of time/IO
to recover based on size, or amount of metadata attached to it and so
on, then it would probably
be the case that some of the PGs you see early on as part of the "35
PGs are backfilling" contain the slow ones but also the faster ones
too, where the faster ones are replaced over as they finish. When all
the easy work is done, only the slow ones remain, making it look like
it waited until the end and then "don't want to work as hard on those
as the first ones" when in fact the sum of work was always going to
take a long time. (we had SMR drives on gig-eth boxes, when one of
those crashed it took .. aaaages to fix). It's just that the easy
parts pass by very fast due to the parallelism in the repairs, leaving
you to see the hard parts but they were never equal to begin with.

Thanks Janne and all for the insights!  The reason why I half-jokingly 
suggested the cluster 'lost interest' in those last few fixes is that 
the recovery statistics' included in ceph -s reported near to zero 
activity for so long.  After a long while those last few 'were fixed' 
--- but if the cluster was moving metadata around to fix the 'holdout 
repairs' that traffic wasn't in the stats.  Those last few objects/pgs 
to be repaired seemingly got fixed 'by magic that didn't include moving 
data counted in the ceph -s stats'.

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx