Re: The last 15 'degraded' items take as many hours as the first 15K?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 5/12/22 18:02, Harry G. Coin wrote:


Thanks Janne and all for the insights!  The reason why I half-jokingly suggested the cluster 'lost interest' in those last few fixes is that the recovery statistics' included in ceph -s reported near to zero activity for so long.  After a long while those last few 'were fixed' --- but if the cluster was moving metadata around to fix the 'holdout repairs' that traffic wasn't in the stats.  Those last few objects/pgs to be repaired seemingly got fixed 'by magic that didn't include moving data counted in the ceph -s stats'.

It's probably the OMAP data (lots of key-values) that takes a lot of time to replicate (We have PGs with over 4 million of objects with just OMAP) and those can take up to 45 minutes to recover all while doing a little bit of network throughput (those are NVMe OSDs). You can check this with "watch -n 3 ceph pg ls remapped" and see how long each backfill takes. And also if it has a lot of OMAP_BYTES and OMAP_KEYS ... but no "BYTES".

Gr. Stefan
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux