Re: how can I achieve HA with ceph?

Karoly Horvath <rhswdev@xxxxxxxxx> · Tue, 20 Dec 2011 18:07:41 +0000

Hi,
all test were made with kill -9, killing the active mds (and sometimes
other processes).I waited a couple of minutes between each test to
make sure that the cluster reached a stable state.(btw: how can I
check this programmatically?)
#  KILLED           result1. mds @ beta       OK2. mds @ alpha
OK3. mds+osd @ beta   FAILED                    switch ok
{0=alpha=up:active}, but FS not readable                    FS
permanently freezed                    rebooted the whole cluster4.
mds+mon @ alpha  OK (32 sec)                    rebooted the whole
cluster5. mds+osd @ beta   OK (25 sec)                    rebooted the
whole cluster6. mds+osd @ beta   OK (24 sec)7. mds+osd @ alpha  OK (30
sec)8. mds+mon+osd @ beta  OK (27 sec)9. power unplug @ alpha FAILED
                 stuck in {0=beta=up:replay} for a long time
         finally it's switching to {0=alpha=up:active}, but FS not
readable                    FS permanently freezed, even when bringing
up alpha...
I uploaded test results here:
http://www.4shared.com/file/5nXMw_sM/cephlogs_mds_test.html?
If you need any other configuration options changed, let me know..logs
were created with:

mkdir -p $LOGDIRtail -f /var/log/ceph/mds.*.log >$LOGDIR/mds.log
&p1=$!tail -f /var/log/ceph/mon.*.log >$LOGDIR/mon.log &p2=$!tail -f
/var/log/ceph/osd.*.log >$LOGDIR/osd.log &p3=$!ceph -w
>$LOGDIR/ceph.log &p4=$!
read linekill $p1 $p2 $p3 $p4
# anonimize ip addressesfor f in $LOGDIR/*.log; do    sed -r -i
's/[0-9]+\.[0-9]+\.[0-9]+\.([0-9]+)/xxx.xxx.xxx.\1/g' $fdone
-- Karoly Horvathrhswdev@xxxxxxxxx

On Mon, Dec 19, 2011 at 10:50 PM, Gregory Farnum
<gregory.farnum@xxxxxxxxxxxxx> wrote:
> On Sun, Dec 18, 2011 at 11:26 AM, Karoly Horvath <rhswdev@xxxxxxxxx> wrote:
>> Hi Guys,
>>
>> two questions:
>>
>> first one is short:
>> The documentation states for all the daemons that they have to be an
>> odd number to work correctly.
>> But what happens if one of the nodes is down? Then, by definition
>> there will be an even number of daemons.
>> Can the system tolerate this failure? If not, do I have to automate
>> the process of quickly bringing up a new node to achieve HA?
>>
>> the second one:
>> I have a simple configuration:
>>
>> ceph version 0.39 (commit:321ecdaba2ceeddb0789d8f4b7180a8ea5785d83)
>> xxx.xxx.xxx.31 alpha (mds, mon, osd)
>> xxx.xxx.xxx.33 beta  (mds, mon, osd)
>> xxx.xxx.xxx.35 gamma (     mon, osd)
>> ceph FS is mounted with listing the two mds-es.
>> I set 'data' and 'metadata' to 2, then tested with 3.
>>
>> I've read the documentation and it suggests this should be enough to
>> achieve High Availability.
>> The data is replicated on all the osd-s (3), there is at least 1 mds
>> up all the time...yet:
>>
>> Each time I remove the power plug from the primary mds node's host,
>> the system goes down and I cannot do a simple `ls`.
>> I can replicate this problem and send you any logfiles or ceph -w
>> outputs you need. Let me know what you need.
>> Here is an example session: http://pastebin.com/R4MgdhUy
>>
>> I once saw the standby mds to wake up and then the FS worked but that
>> was after 20 minutes, which is way too long for a HA scenario.
>>
>> There is hardly any data on the FS at the moment (400MB, lol..), and
>> hardly any writes...
>>
>> I'm willing to sacrifice (a lot of) performance to achieve high availability.
>> Let me know if there are configuration settings to achieve this.
> TV's right. Specifically regarding the MDS:
> As TV said, the MDS should time out and get replaced within 30 seconds
> (this is controlled by the "mds beacon grace" setting). It is a
> failover procedure, not shared masters or something, but from my
> experience it takes on the order of 10 seconds to complete once
> failure is detected. 20 minutes is completely wrong. If we're not
> already talking elsewhere, then I'd like it if you could enable MDS
> logging and reproduce this and post the logs somewhere so I can check
> out what's going on.
> -Greg
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html