monitor quorum

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

I have a ceph cluster running 0.80.1 on Ubuntu 14.04.  I have 3 monitors
and 4 OSD nodes currently.

Everything has been running great up until today where I've got an issue
with the monitors.
I moved mon03 to a different switchport so it would have temporarily lost
connectivity.
Since then, the cluster is reporting that that mon is down, although it's
definitely up.
I've tried restarting the mon services on all three mons, but that hasn't
made a difference.
I definitely, 100% do not have any clock skew on any of the mons.  This has
been triple-checked as the ceph docs seem to suggest that might be the
cause of this issue.

Here is what ceph -s and ceph health detail are reporting as well as the
mon_status for each monitor:


# ceph -s ; ceph health detail
    cluster XXX
     health HEALTH_WARN 1 mons down, quorum 0,1 ceph-mon-01,ceph-mon-02
     monmap e2: 3 mons at {ceph-mon-01=
10.1.1.64:6789/0,ceph-mon-02=10.1.1.65:6789/0,ceph-mon-03=10.1.1.66:6789/0},
election epoch 932, quorum 0,1 ceph-mon-01,ceph-mon-02
     osdmap e49213: 80 osds: 80 up, 80 in
      pgmap v18242952: 4864 pgs, 5 pools, 69910 GB data, 17638 kobjects
            197 TB used, 95904 GB / 290 TB avail
                   8 active+clean+scrubbing+deep
                4856 active+clean
  client io 6893 kB/s rd, 5657 kB/s wr, 2090 op/s
HEALTH_WARN 1 mons down, quorum 0,1 ceph-mon-01,ceph-mon-02
mon.ceph-mon-03 (rank 2) addr 10.1.1.66:6789/0 is down (out of quorum)


{ "name": "ceph-mon-01",
  "rank": 0,
  "state": "leader",
  "election_epoch": 932,
  "quorum": [
        0,
        1],
  "outside_quorum": [],
  "extra_probe_peers": [],
  "sync_provider": [],
  "monmap": { "epoch": 2,
      "fsid": "XXX",
      "modified": "0.000000",
      "created": "0.000000",
      "mons": [
            { "rank": 0,
              "name": "ceph-mon-01",
              "addr": "10.1.1.64:6789\/0"},
            { "rank": 1,
              "name": "ceph-mon-02",
              "addr": "10.1.1.65:6789\/0"},
            { "rank": 2,
              "name": "ceph-mon-03",
              "addr": "10.1.1.66:6789\/0"}]}}


{ "name": "ceph-mon-02",
  "rank": 1,
  "state": "peon",
  "election_epoch": 932,
  "quorum": [
        0,
        1],
  "outside_quorum": [],
  "extra_probe_peers": [],
  "sync_provider": [],
  "monmap": { "epoch": 2,
      "fsid": "XXX",
      "modified": "0.000000",
      "created": "0.000000",
      "mons": [
            { "rank": 0,
              "name": "ceph-mon-01",
              "addr": "10.1.1.64:6789\/0"},
            { "rank": 1,
              "name": "ceph-mon-02",
              "addr": "10.1.1.65:6789\/0"},
            { "rank": 2,
              "name": "ceph-mon-03",
              "addr": "10.1.1.66:6789\/0"}]}}


{ "name": "ceph-mon-03",
  "rank": 2,
  "state": "electing",
  "election_epoch": 931,
  "quorum": [],
  "outside_quorum": [],
  "extra_probe_peers": [],
  "sync_provider": [],
  "monmap": { "epoch": 2,
      "fsid": "XXX",
      "modified": "0.000000",
      "created": "0.000000",
      "mons": [
            { "rank": 0,
              "name": "ceph-mon-01",
              "addr": "10.1.1.64:6789\/0"},
            { "rank": 1,
              "name": "ceph-mon-02",
              "addr": "10.1.1.65:6789\/0"},
            { "rank": 2,
              "name": "ceph-mon-03",
              "addr": "10.1.1.66:6789\/0"}]}}


Any help or advice is appreciated.

Regards

James
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140917/b93396b3/attachment.htm>


[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux