Hi, all test were made with kill -9, killing the active mds (and sometimes other processes).I waited a couple of minutes between each test to make sure that the cluster reached a stable state.(btw: how can I check this programmatically?) # KILLED result1. mds @ beta OK2. mds @ alpha OK3. mds+osd @ beta FAILED switch ok {0=alpha=up:active}, but FS not readable FS permanently freezed rebooted the whole cluster4. mds+mon @ alpha OK (32 sec) rebooted the whole cluster5. mds+osd @ beta OK (25 sec) rebooted the whole cluster6. mds+osd @ beta OK (24 sec)7. mds+osd @ alpha OK (30 sec)8. mds+mon+osd @ beta OK (27 sec)9. power unplug @ alpha FAILED stuck in {0=beta=up:replay} for a long time finally it's switching to {0=alpha=up:active}, but FS not readable FS permanently freezed, even when bringing up alpha... I uploaded test results here: http://www.4shared.com/file/5nXMw_sM/cephlogs_mds_test.html? If you need any other configuration options changed, let me know..logs were created with: mkdir -p $LOGDIRtail -f /var/log/ceph/mds.*.log >$LOGDIR/mds.log &p1=$!tail -f /var/log/ceph/mon.*.log >$LOGDIR/mon.log &p2=$!tail -f /var/log/ceph/osd.*.log >$LOGDIR/osd.log &p3=$!ceph -w >$LOGDIR/ceph.log &p4=$! read linekill $p1 $p2 $p3 $p4 # anonimize ip addressesfor f in $LOGDIR/*.log; do sed -r -i 's/[0-9]+\.[0-9]+\.[0-9]+\.([0-9]+)/xxx.xxx.xxx.\1/g' $fdone -- Karoly Horvathrhswdev@xxxxxxxxx On Mon, Dec 19, 2011 at 10:50 PM, Gregory Farnum <gregory.farnum@xxxxxxxxxxxxx> wrote: > On Sun, Dec 18, 2011 at 11:26 AM, Karoly Horvath <rhswdev@xxxxxxxxx> wrote: >> Hi Guys, >> >> two questions: >> >> first one is short: >> The documentation states for all the daemons that they have to be an >> odd number to work correctly. >> But what happens if one of the nodes is down? Then, by definition >> there will be an even number of daemons. >> Can the system tolerate this failure? If not, do I have to automate >> the process of quickly bringing up a new node to achieve HA? >> >> the second one: >> I have a simple configuration: >> >> ceph version 0.39 (commit:321ecdaba2ceeddb0789d8f4b7180a8ea5785d83) >> xxx.xxx.xxx.31 alpha (mds, mon, osd) >> xxx.xxx.xxx.33 beta (mds, mon, osd) >> xxx.xxx.xxx.35 gamma ( mon, osd) >> ceph FS is mounted with listing the two mds-es. >> I set 'data' and 'metadata' to 2, then tested with 3. >> >> I've read the documentation and it suggests this should be enough to >> achieve High Availability. >> The data is replicated on all the osd-s (3), there is at least 1 mds >> up all the time...yet: >> >> Each time I remove the power plug from the primary mds node's host, >> the system goes down and I cannot do a simple `ls`. >> I can replicate this problem and send you any logfiles or ceph -w >> outputs you need. Let me know what you need. >> Here is an example session: http://pastebin.com/R4MgdhUy >> >> I once saw the standby mds to wake up and then the FS worked but that >> was after 20 minutes, which is way too long for a HA scenario. >> >> There is hardly any data on the FS at the moment (400MB, lol..), and >> hardly any writes... >> >> I'm willing to sacrifice (a lot of) performance to achieve high availability. >> Let me know if there are configuration settings to achieve this. > TV's right. Specifically regarding the MDS: > As TV said, the MDS should time out and get replaced within 30 seconds > (this is controlled by the "mds beacon grace" setting). It is a > failover procedure, not shared masters or something, but from my > experience it takes on the order of 10 seconds to complete once > failure is detected. 20 minutes is completely wrong. If we're not > already talking elsewhere, then I'd like it if you could enable MDS > logging and reproduce this and post the logs somewhere so I can check > out what's going on. > -Greg -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html