On 4/06/2013 9:16 AM, Chen, Xiaoxi wrote: > my 0.02, you really dont need to wait for health_ok between your > recovery steps,just go ahead. Everytime a new map be generated and > broadcasted,the old map and in-progress recovery will be canceled thanks Xiaoxi, that is helpful to know. It seems to me that there might be a failure-mode (or race-condition?) here though, as the cluster is now struggling to recover as the replacement OSD caused the cluster to go into backfill_toofull. The failure sequence might be: 1. From HEALTH_OK crash an OSD 2. Wait for recovery 3. Remove OSD using usual procedures 4. Wait for recovery 5. Add back OSD using usual procedures 6. Wait for recovery 7. Cluster is unable to recover due to toofull conditions Perhaps this is a needed test case to round-trip a cluster through a known failure/recovery scenario. Note this is using a simplistically configured test-cluster with CephFS in the mix and about 2.5 million files. Something else I noticed: I restarted the cluster (and set the leveldb compact option since I'd run out of space on the roots) and now I see it is again making progress on the backfill. Seems odd that the cluster pauses but a restart clears the pause, is that by design? _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com