I understood the mechanism more through your answer. I'm using erasure coding and backfilling step took quite a long time :( If there was just a lot of pg peering. I think it's reasonable. but I was curious why there was a lot of backfill_wait instead of peering. (e.g. pg 9.5a is stuck undersized for 39h, current state active+undersized+degraded+remapped+backfill_wait ) let me know if you have the tips to increase the performance of backfill or prevent unnecessary backfill. Thank you for your answer. Joshua Baergen wrote: > Hi Jaemin, > > It is normal for PGs to become degraded during a host reboot, since a > copy of the data was taken offline and needs to be resynchronized > after the host comes back. Normally this is quick, as the recovery > mechanism only needs to modify those objects that have changed while > the host is down. > > However, if you have backfills ongoing and reboot a host that contains > OSDs involved in those backfills, then those backfills become > degraded, and you will need to wait for them to complete for > degradation to clear. Do you know if you had backfills at the time the > host was rebooted? If so, the way to avoid this is to wait for > backfill to complete before taking any OSDs/hosts down for > maintenance. > > Josh _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx