Re: please help explain about failover

ceph@xxxxxxxxxxxxxx · Mon, 15 Aug 2016 11:33:21 +0200



Look at
http://docs.ceph.com/docs/master/rados/configuration/mon-osd-interaction/,
there is a couple of settings about "should I consider that OSD down ?"

As soon as an OSD is down, the cluster starts rebalancing, to heal
itself (basically, missing object are copied to healthy OSDs)

Then, maybe, the broken OSD will come back to life
Here again, the cluster will rebalance, it will recreate missing object
to that OSDs
It will also find some "extra" object, they will be deleted

At the end, you will always have an healthy cluster, unless:
- you're running out of space (near-full cluster that cannot handle an
OSD failure)
- too many OSDs died at the same time, making autohealing unefficient:
you will have "some" objects missing (if all copies were on missing
OSDs, there is no way to recreate them)


On 15/08/2016 11:18, kpeng@xxxxxxxxxx wrote:
> hello,
> 
> sorry I am new to ceph.
> Have a question that, we have a cluster of 9 nodes, each with 12 hard
> disks, one osd per disk. if one node gets down, saying 30 minutes,
> during this period all replicas it has will be replicated to other
> OSDes? and, when the node gets started up, how ceph handle the replicas
> again?
> 
> 
> thanks.
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com