Re: One host failure bring down the whole cluster

Henrik Korkuc <lists@xxxxxxxxx> · Tue, 31 Mar 2015 12:47:11 +0300

On 3/31/15 11:27, Kai KH Huang wrote:
1) But Ceph says "...You can run a cluster with 1 monitor....." (http://ceph.com/docs/master/rados/operations/add-or-rm-mons/), I assume it should work. And brain split is not my current concern
Point is that you must have majority of monitors up.
* In one monitor setup you need one monitor running,
* In two monitor setup you need two monitors running,because if one goes 
down you do not have majority up,
* In three monitor setup you need at least two monitors up, because if 
one goes down you still have majority up,
* 4 - at least 3
* 5 - at least 3
* etc

2) I've written object to Ceph, now I just want to get it back

Anyway. I tried to reduce the mon number to 1. But after I remove it following the steps, it just cannot start up any more....

1. [root~]  service ceph -a stop mon.serverB
2. [root~]  ceph mon remove serverB ## hang here forever
3. #Remove the monitor entry from ceph.conf.
4. Restart ceph service
It is grey area for me, but I think that you failed to remove that 
monitor because you didn't have a quorum for operation to succeed. I 
think you'll need to modify monmap manually and remove second monitor 
from it

[root@serverA~]# systemctl status ceph.service -l
ceph.service - LSB: Start Ceph distributed file system daemons at boot time
    Loaded: loaded (/etc/rc.d/init.d/ceph)
    Active: failed (Result: timeout) since Tue 2015-03-31 15:46:25 CST; 3min 15s ago
   Process: 2937 ExecStop=/etc/rc.d/init.d/ceph stop (code=exited, status=0/SUCCESS)
   Process: 3670 ExecStart=/etc/rc.d/init.d/ceph start (code=killed, signal=TERM)

Mar 31 15:44:26 serverA ceph[3670]: === osd.6 ===
Mar 31 15:44:56 serverA ceph[3670]: failed: 'timeout 30 /usr/bin/ceph -c /etc/ceph/ceph.conf --name=osd.6 --keyring=/var/lib/ceph/osd/ceph-6/keyring osd crush create-or-move -- 6 3.64 host=serverA root=default'
Mar 31 15:44:56 serverA ceph[3670]: === osd.7 ===
Mar 31 15:45:26 serverA ceph[3670]: failed: 'timeout 30 /usr/bin/ceph -c /etc/ceph/ceph.conf --name=osd.7 --keyring=/var/lib/ceph/osd/ceph-7/keyring osd crush create-or-move -- 7 3.64 host=serverA root=default'
Mar 31 15:45:26 serverA ceph[3670]: === osd.8 ===
Mar 31 15:45:57 serverA ceph[3670]: failed: 'timeout 30 /usr/bin/ceph -c /etc/ceph/ceph.conf --name=osd.8 --keyring=/var/lib/ceph/osd/ceph-8/keyring osd crush create-or-move -- 8 3.64 host=serverA root=default'
Mar 31 15:45:57 serverA ceph[3670]: === osd.9 ===
Mar 31 15:46:25 serverA systemd[1]: ceph.service operation timed out. Terminating.
Mar 31 15:46:25 serverA systemd[1]: Failed to start LSB: Start Ceph distributed file system daemons at boot time.
Mar 31 15:46:25 serverA systemd[1]: Unit ceph.service entered failed state.

/var/log/ceph/ceph.log says:
2015-03-31 15:55:57.648800 mon.0 10.???.78:6789/0 1048 : cluster [INF] osd.21 10.???.78:6855/25598 failed (39 reports from 9 peers after 20.118062 >= grace 20.000000)
2015-03-31 15:55:57.931889 mon.0 10.???.78:6789/0 1055 : cluster [INF] osd.15 10.????.78:6825/23894 failed (39 reports from 9 peers after 20.401379 >= grace 20.000000)

Obviously serverB is down, but it should not affect serverA from functioning? Right?
________________________________________
From: Gregory Farnum [greg@xxxxxxxxxxx]
Sent: Tuesday, March 31, 2015 11:53 AM
To: Lindsay Mathieson; Kai KH Huang
Cc: ceph-users@xxxxxxxxxxxxxx
Subject: Re:  One host failure bring down the whole cluster

On Mon, Mar 30, 2015 at 8:02 PM, Lindsay Mathieson
<lindsay.mathieson@xxxxxxxxx> wrote:
On Tue, 31 Mar 2015 02:42:27 AM Kai KH Huang wrote:
Hi, all
     I have a two-node Ceph cluster, and both are monitor and osd. When
they're both up, osd are all up and in, everything is fine... almost:

Two things.

1 -  You *really* need a min of three monitors. Ceph cannot form a quorum with
just two monitors and you run a risk of split brain.
You can form quorums with an even number of monitors, and Ceph does so
— there's no risk of split brain.

The problem with 2 monitors is that a quorum is always 2 — which is
exactly what you're seeing right now. You can't run with only one
monitor up (assuming you have a non-zero number of them).

2 - You also probably have a min size of two set (the default). This means
that you need a minimum  of two copies of each data object for writes to work.
So with just two nodes, if one goes down you can't write to the other.
Also this.

So:
- Install a extra monitor node - it doesn't have to be powerful, we just use a
Intel Celeron NUC for that.

- reduce your minimum size to 1 (One).
Yep.
-Greg
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com