Re: how can I achieve HA with ceph?

Gregory Farnum <gregory.farnum@xxxxxxxxxxxxx> · Tue, 20 Dec 2011 14:50:09 -0800



On Tue, Dec 20, 2011 at 10:07 AM, Karoly Horvath <rhswdev@xxxxxxxxx> wrote:
> Hi,
> all test were made with kill -9, killing the active mds (and sometimes
> other processes).I waited a couple of minutes between each test to
> make sure that the cluster reached a stable state.(btw: how can I
> check this programmatically?)
You can run "ceph health", which has only a few different values you
can look for. :)

> #  KILLED           result1. mds @ beta       OK2. mds @ alpha
> OK3. mds+osd @ beta   FAILED                    switch ok
> {0=alpha=up:active}, but FS not readable                    FS
> permanently freezed                    rebooted the whole cluster4.
> mds+mon @ alpha  OK (32 sec)                    rebooted the whole
> cluster5. mds+osd @ beta   OK (25 sec)                    rebooted the
> whole cluster6. mds+osd @ beta   OK (24 sec)7. mds+osd @ alpha  OK (30
> sec)8. mds+mon+osd @ beta  OK (27 sec)9. power unplug @ alpha FAILED
>                  stuck in {0=beta=up:replay} for a long time
>          finally it's switching to {0=alpha=up:active}, but FS not
> readable                    FS permanently freezed, even when bringing
> up alpha...
Your formatting got pretty mangled here, and I'm still not sure what's
going on. Did you restart all the daemons between each kill attempt?
(for instance, it looks like '1' is to kill mds.beta; '2' is to kill
mds.alpha, and then '3' is to kill mds.beta — but you already did
that)

> I uploaded test results here:
> http://www.4shared.com/file/5nXMw_sM/cephlogs_mds_test.html?
> If you need any other configuration options changed, let me know
Sorry, I should have been clearer when I said turn on mds logging. Add
"debug mds = 20" and "debug ms = 1" lines to your ceph.conf MDS
sections. This will spit out a lot more information about what's going
on internally, which will help us diagnose this. :)
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html