Re: Crushmap Design Question

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 01/09/2013 08:59 AM, Wido den Hollander wrote:
> Hi,
> 
> On 01/09/2013 01:53 AM, Chen, Xiaoxi wrote:
>> Hi,
>> 	Setting rep size to 3 only make the data triple-replication, that means when you "fail" all OSDs in 2 out of 3 DCs, the data still accessable.
>> 	But Monitor is another story, for monitor clusters with 2N+1 nodes, it require at least N+1 nodes alive, and indeed this is why you Ceph failed.
>> 	It looks to me this discipline make it hard to design a proper deployment which is robust in DC outage. But hoping for inputs from community,how to make Monitor cluster reliable.
>>
> 
>  From what I understand he didn't kill the second mon, still leaving 2
> out of 3 mons running.

Indeed. A good hint that this is the case is this bit of Shawn's message:

>> When I fail a datacenter (including 1 of 3 mon's) I eventually get:
>> 2013-01-08 13:58:54.020477 mon.0 [INF] pgmap v2712139: 7104 pgs: 7104 active+degraded; 60264 MB data, 137 GB used, 13570 GB / 14146 GB avail; 16362/49086 degraded (33.333%)
>>
>> At this point everything is still ok.  But when I fail the 2nd datacenter (still leaving 2 out of 3 mons running) I get:
>> 2013-01-08 14:01:25.600056 mon.0 [INF] pgmap v2712189: 7104 pgs: 7104 incomplete; 60264 MB data, 137 GB used, 13570 GB / 14146 GB avail

If you still manage to get these messages, it means your monitors are
still handling and answering requests, and that only happens when you
have a quorum :)

  -Joao
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux