Re: replacing an OSD or crush map sensitivity

Sage Weil <sage@xxxxxxxxxxx> · Mon, 3 Jun 2013 20:59:15 -0700 (PDT)

On Tue, 4 Jun 2013, Nigel Williams wrote:
> On 4/06/2013 9:16 AM, Chen, Xiaoxi wrote:
> > my 0.02? you really dont need to wait for health_ok between your
> > recovery steps,just go ahead. Everytime a new map be generated and
> > broadcasted,the old map and in-progress recovery will be canceled
> 
> thanks Xiaoxi, that is helpful to know.
> 
> It seems to me that there might be a failure-mode (or race-condition?)
> here though, as the cluster is now struggling to recover as the
> replacement OSD caused the cluster to go into backfill_toofull.
> 
> The failure sequence might be:
> 
> 1. From HEALTH_OK crash an OSD
> 2. Wait for recovery
> 3. Remove OSD using usual procedures
> 4. Wait for recovery
> 5. Add back OSD using usual procedures
> 6. Wait for recovery
> 7. Cluster is unable to recover due to toofull conditions
> 
> Perhaps this is a needed test case to round-trip a cluster through a
> known failure/recovery scenario.
> 
> Note this is using a simplistically configured test-cluster with CephFS
> in the mix and about 2.5 million files.
> 
> Something else I noticed: I restarted the cluster (and set the leveldb
> compact option since I'd run out of space on the roots) and now I see it
> is again making progress on the backfill. Seems odd that the cluster
> pauses but a restart clears the pause, is that by design?

Does the monitor data directory share a disk with an OSD?  If so, that 
makes sense: compaction freed enough space to drop below the threshold...

sage
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com