Re: replacing an OSD or crush map sensitivity

Nigel Williams <nigel.d.williams@xxxxxxxxx> · Tue, 04 Jun 2013 13:53:29 +1000

On 4/06/2013 9:16 AM, Chen, Xiaoxi wrote:
> my 0.02， you really dont need to wait for health_ok between your
> recovery steps,just go ahead. Everytime a new map be generated and
> broadcasted,the old map and in-progress recovery will be canceled

thanks Xiaoxi, that is helpful to know.

It seems to me that there might be a failure-mode (or race-condition?)
here though, as the cluster is now struggling to recover as the
replacement OSD caused the cluster to go into backfill_toofull.

The failure sequence might be:

1. From HEALTH_OK crash an OSD
2. Wait for recovery
3. Remove OSD using usual procedures
4. Wait for recovery
5. Add back OSD using usual procedures
6. Wait for recovery
7. Cluster is unable to recover due to toofull conditions

Perhaps this is a needed test case to round-trip a cluster through a
known failure/recovery scenario.

Note this is using a simplistically configured test-cluster with CephFS
in the mix and about 2.5 million files.

Something else I noticed: I restarted the cluster (and set the leveldb
compact option since I'd run out of space on the roots) and now I see it
is again making progress on the backfill. Seems odd that the cluster
pauses but a restart clears the pause, is that by design?

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com