monitor quorum

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I decided to remove mon-03 and re-create it.  I copied the keyring and
monmap from one of the other monitors, but the cluster is still reporting
it as down (out of quorum).

mon03 is now not in the electing state, but in the probing state.

mon-03:~# ceph --admin-daemon /run/ceph/ceph-mon.ceph-mon-03.asok mon_status
{ "name": "ceph-mon-03",
  "rank": 2,
  "state": "probing",
  "election_epoch": 975,
  "quorum": [],
  "outside_quorum": [
        "ceph-mon-03"],
  "extra_probe_peers": [],
  "sync_provider": [],
  "monmap": { "epoch": 5,
      "fsid": "74069c87-b361-4bb8-8ce8-6ae9deb8a9bd",
      "modified": "2014-09-19 08:15:38.050896",
      "created": "0.000000",
      "mons": [
            { "rank": 0,
              "name": "ceph-mon-01",
              "addr": "10.1.1.64:6789\/0"},
            { "rank": 1,
              "name": "ceph-mon-02",
              "addr": "10.1.1.65:6789\/0"},
            { "rank": 2,
              "name": "ceph-mon-03",
              "addr": "10.1.1.66:6789\/0"}]}}


I'm really struggling to know what to do now, since even removing this
monitor and re-creating it didn't seem to fix the problem.
Now I'm suspicious that there is a problem with the other two monitors.

As stated before, any help will really be appreciated :)

J

On 18 September 2014 10:24, James Eckersall <james.eckersall at gmail.com>
wrote:

> Is anyone able to offer any advice on how to fix this?
> I've tried re-injecting the monmap into mon03 as that was mentioned in the
> mon troubleshooting docs, but that has not helped at all.  mon03 is still
> stuck in the same electing state :(
>
> I've increased the debug level on mon03 and it is reporting the following,
> repeatedly:
>
> 2014-09-18 10:22:12.788061 7f30f9818700  5 mon.ceph-mon-03 at 2(electing).elector(947)
> start -- can i be leader?
> 2014-09-18 10:22:12.788105 7f30f9818700  1 mon.ceph-mon-03 at 2(electing).elector(947)
> init, last seen epoch 947
> 2014-09-18 10:22:12.788111 7f30f9818700  1 -- 10.1.1.66:6789/0 --> mon.0
> 10.1.1.64:6789/0 -- election(XXX propose 947) v5 -- ?+0 0x7f3104568dc0
> 2014-09-18 10:22:12.788129 7f30f9818700  1 -- 10.1.1.66:6789/0 --> mon.1
> 10.1.1.65:6789/0 -- election(XXX propose 947) v5 -- ?+0 0x7f3104568b00
> 2014-09-18 10:22:14.470715 7f30f7f14700  1 -- 10.1.1.66:6789/0 >> :/0
> pipe(0x7f31020a5c00 sd=13 :6789 s=0 pgs=0 cs=0 l=0 c=0x7f31036be7e0).accept
> sd=13 10.1.1.10:50568/0
> 2014-09-18 10:22:14.470926 7f30f7f14700 10 mon.ceph-mon-03 at 2(electing) e3
> ms_verify_authorizer 10.1.1.10:0/1007970 client protocol 0
> 2014-09-18 10:22:14.471281 7f30f9017700  1 -- 10.1.1.66:6789/0 <==
> client.? 10.1.1.10:0/1007970 1 ==== auth(proto 0 30 bytes epoch 0) v1
> ==== 60+0+0 (673663173 0 0) 0x7f310282d600 con 0x7f31036be7e0
> 2014-09-18 10:22:14.471296 7f30f9017700  5 mon.ceph-mon-03 at 2(electing) e3
> waitlisting message auth(proto 0 30 bytes epoch 0) v1
> 2014-09-18 10:22:14.866689 7f30f9818700  5 mon.ceph-mon-03 at 2(electing) e3
> waitlisting message auth(proto 0 30 bytes epoch 0) v1
>
> 2014-09-18 10:22:17.470417 7f30f9017700 10 mon.ceph-mon-03 at 2(electing) e3
> ms_handle_reset 0x7f31036be7e0 10.1.1.10:0/1007970
> 2014-09-18 10:22:17.788184 7f30f9818700  5 mon.ceph-mon-03 at 2(electing).elector(947)
> election timer expired
>
>
> J
>
> On 17 September 2014 17:05, James Eckersall <james.eckersall at gmail.com>
> wrote:
>
>> Hi,
>>
>> Now I feel dumb for jumping to the conclusion that it was a simple
>> networking issue - it isn't.
>> I've just checked connectivity properly and I can ping and telnet 6789
>> from all mon servers to all other mon servers.
>>
>> I've just restarted the mon03 service and the log is showing the
>> following:
>>
>> 2014-09-17 16:49:02.355148 7f7ef9f8c800  0 starting mon.ceph-mon-03 rank
>> 2 at 10.1.1.66:6789/0 mon_data /var/lib/ceph/mon/ceph-ceph-mon-03 fsid
>> 74069c87-b361-4bb8-8ce8-6ae9deb8a9bd
>> 2014-09-17 16:49:02.355375 7f7ef9f8c800  1 mon.ceph-mon-03 at -1(probing)
>> e2 preinit fsid 74069c87-b361-4bb8-8ce8-6ae9deb8a9bd
>> 2014-09-17 16:49:02.356347 7f7ef9f8c800  1 mon.ceph-mon-03 at -1(probing).paxosservice(pgmap
>> 18241250..18241952) refresh upgraded, format 0 -> 1
>> 2014-09-17 16:49:02.356360 7f7ef9f8c800  1 mon.ceph-mon-03 at -1(probing).pg
>> v0 on_upgrade discarding in-core PGMap
>> 2014-09-17 16:49:02.400316 7f7ef9f8c800  0 mon.ceph-mon-03 at -1(probing).mds
>> e1 print_map
>> epoch 1
>> flags 0
>> created 2013-12-09 10:19:58.534310
>> modified 2013-12-09 10:19:58.534332
>> tableserver 0
>> root 0
>> session_timeout 60
>> session_autoclose 300
>> max_file_size 1099511627776
>> last_failure 0
>> last_failure_osd_epoch 0
>> compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable
>> ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds
>> uses versioned encoding}
>> max_mds 1
>> in
>> up {}
>> failed
>> stopped
>> data_pools 0
>> metadata_pool 1
>> inline_data disabled
>>
>> 2014-09-17 16:49:02.402373 7f7ef9f8c800  0 mon.ceph-mon-03 at -1(probing).osd
>> e49212 crush map has features 1107558400, adjusting msgr requires
>> 2014-09-17 16:49:02.402384 7f7ef9f8c800  0 mon.ceph-mon-03 at -1(probing).osd
>> e49212 crush map has features 1107558400, adjusting msgr requires
>> 2014-09-17 16:49:02.402386 7f7ef9f8c800  0 mon.ceph-mon-03 at -1(probing).osd
>> e49212 crush map has features 1107558400, adjusting msgr requires
>> 2014-09-17 16:49:02.402388 7f7ef9f8c800  0 mon.ceph-mon-03 at -1(probing).osd
>> e49212 crush map has features 1107558400, adjusting msgr requires
>> 2014-09-17 16:49:02.403725 7f7ef9f8c800  1 mon.ceph-mon-03 at -1(probing).paxosservice(auth
>> 26001..26154) refresh upgraded, format 0 -> 1
>> 2014-09-17 16:49:02.404834 7f7ef9f8c800  0 mon.ceph-mon-03 at -1(probing)
>> e2  my rank is now 2 (was -1)
>> 2014-09-17 16:49:02.407439 7f7ef331b700  1 mon.ceph-mon-03 at 2(synchronizing)
>> e2 sync_obtain_latest_monmap
>> 2014-09-17 16:49:02.407588 7f7ef331b700  1 mon.ceph-mon-03 at 2(synchronizing)
>> e2 sync_obtain_latest_monmap obtained monmap e2
>> 2014-09-17 16:49:09.514365 7f7ef331b700  0 log [INF] : mon.ceph-mon-03
>> calling new monitor election
>> 2014-09-17 16:49:09.514523 7f7ef331b700  1 mon.ceph-mon-03 at 2(electing).elector(931)
>> init, last seen epoch 931
>> 2014-09-17 16:49:09.514658 7f7ef331b700  1 mon.ceph-mon-03 at 2(electing).paxos(paxos
>> recovering c 31223899..31224482) is_readable now=2014-09-17 16:49:09.514659
>> lease_expire=0.000000 has v0 lc 31224482
>> 2014-09-17 16:49:09.514665 7f7ef331b700  1 mon.ceph-mon-03 at 2(electing).paxos(paxos
>> recovering c 31223899..31224482) is_readable now=2014-09-17 16:49:09.514666
>> lease_expire=0.000000 has v0 lc 31224482
>> 2014-09-17 16:49:15.533876 7f7ef3b1c700  1 mon.ceph-mon-03 at 2(electing).elector(933)
>> init, last seen epoch 933
>> 2014-09-17 16:49:21.578269 7f7ef3b1c700  1 mon.ceph-mon-03 at 2(electing).elector(935)
>> init, last seen epoch 935
>> 2014-09-17 16:49:26.578526 7f7ef3b1c700  1 mon.ceph-mon-03 at 2(electing).elector(935)
>> init, last seen epoch 935
>> 2014-09-17 16:49:31.578790 7f7ef3b1c700  1 mon.ceph-mon-03 at 2(electing).elector(935)
>> init, last seen epoch 935
>> 2014-09-17 16:49:36.579044 7f7ef3b1c700  1 mon.ceph-mon-03 at 2(electing).elector(935)
>> init, last seen epoch 935
>>
>>
>> The last lines about "electing" repeat forever.  The other mons are
>> logging far more entries than I have seen them log before.  They look like
>> the following (note the timestamps - all of these log lines are from just a
>> 2 second period):
>>
>> 2014-09-17 16:55:10.019407 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
>> active c 31224401..31225038) is_readable now=2014-09-17 16:55:10.019408
>> lease_expire=2014-09-17 16:55:14.518716 has v0 lc 31225038
>> 2014-09-17 16:55:10.019418 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
>> active c 31224401..31225038) is_readable now=2014-09-17 16:55:10.019418
>> lease_expire=2014-09-17 16:55:14.518716 has v0 lc 31225038
>> 2014-09-17 16:55:10.180220 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
>> active c 31224401..31225038) is_readable now=2014-09-17 16:55:10.180222
>> lease_expire=2014-09-17 16:55:14.518716 has v0 lc 31225038
>> 2014-09-17 16:55:10.180233 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
>> active c 31224401..31225038) is_readable now=2014-09-17 16:55:10.180234
>> lease_expire=2014-09-17 16:55:14.518716 has v0 lc 31225038
>> 2014-09-17 16:55:10.192668 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
>> active c 31224401..31225038) is_readable now=2014-09-17 16:55:10.192670
>> lease_expire=2014-09-17 16:55:14.518716 has v0 lc 31225038
>> 2014-09-17 16:55:10.192691 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
>> active c 31224401..31225038) is_readable now=2014-09-17 16:55:10.192692
>> lease_expire=2014-09-17 16:55:14.518716 has v0 lc 31225038
>> 2014-09-17 16:55:10.276726 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
>> active c 31224401..31225038) is_readable now=2014-09-17 16:55:10.276727
>> lease_expire=2014-09-17 16:55:14.518716 has v0 lc 31225038
>> 2014-09-17 16:55:10.276737 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
>> active c 31224401..31225038) is_readable now=2014-09-17 16:55:10.276737
>> lease_expire=2014-09-17 16:55:14.518716 has v0 lc 31225038
>> 2014-09-17 16:55:10.302638 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
>> active c 31224401..31225038) is_readable now=2014-09-17 16:55:10.302640
>> lease_expire=2014-09-17 16:55:14.518716 has v0 lc 31225038
>> 2014-09-17 16:55:10.302651 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
>> active c 31224401..31225038) is_readable now=2014-09-17 16:55:10.302652
>> lease_expire=2014-09-17 16:55:14.518716 has v0 lc 31225038
>> 2014-09-17 16:55:10.362642 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
>> active c 31224401..31225038) is_readable now=2014-09-17 16:55:10.362643
>> lease_expire=2014-09-17 16:55:14.518716 has v0 lc 31225038
>> 2014-09-17 16:55:10.362655 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
>> active c 31224401..31225038) is_readable now=2014-09-17 16:55:10.362656
>> lease_expire=2014-09-17 16:55:14.518716 has v0 lc 31225038
>> 2014-09-17 16:55:10.385686 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
>> active c 31224401..31225038) is_readable now=2014-09-17 16:55:10.385687
>> lease_expire=2014-09-17 16:55:14.518716 has v0 lc 31225038
>> 2014-09-17 16:55:10.385697 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
>> active c 31224401..31225038) is_readable now=2014-09-17 16:55:10.385697
>> lease_expire=2014-09-17 16:55:14.518716 has v0 lc 31225038
>> 2014-09-17 16:55:10.406712 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
>> active c 31224401..31225038) is_readable now=2014-09-17 16:55:10.406713
>> lease_expire=2014-09-17 16:55:14.518716 has v0 lc 31225038
>> 2014-09-17 16:55:10.406723 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
>> active c 31224401..31225038) is_readable now=2014-09-17 16:55:10.406724
>> lease_expire=2014-09-17 16:55:14.518716 has v0 lc 31225038
>> 2014-09-17 16:55:10.423277 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
>> active c 31224401..31225038) is_readable now=2014-09-17 16:55:10.423279
>> lease_expire=2014-09-17 16:55:14.518716 has v0 lc 31225038
>> 2014-09-17 16:55:10.423299 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
>> active c 31224401..31225038) is_readable now=2014-09-17 16:55:10.423300
>> lease_expire=2014-09-17 16:55:14.518716 has v0 lc 31225038
>> 2014-09-17 16:55:10.543138 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
>> updating c 31224401..31225038) is_readable now=2014-09-17 16:55:10.543139
>> lease_expire=0.000000 has v0 lc 31225038
>> 2014-09-17 16:55:10.543145 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
>> updating c 31224401..31225038) is_readable now=2014-09-17 16:55:10.543145
>> lease_expire=0.000000 has v0 lc 31225038
>> 2014-09-17 16:55:10.580911 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
>> active c 31224401..31225039) is_readable now=2014-09-17 16:55:10.580912
>> lease_expire=2014-09-17 16:55:15.549947 has v0 lc 31225039
>> 2014-09-17 16:55:10.580922 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
>> active c 31224401..31225039) is_readable now=2014-09-17 16:55:10.580923
>> lease_expire=2014-09-17 16:55:15.549947 has v0 lc 31225039
>> 2014-09-17 16:55:10.580930 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
>> active c 31224401..31225039) is_readable now=2014-09-17 16:55:10.580930
>> lease_expire=2014-09-17 16:55:15.549947 has v0 lc 31225039
>> 2014-09-17 16:55:10.606130 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
>> updating c 31224401..31225039) is_readable now=2014-09-17 16:55:10.606131
>> lease_expire=0.000000 has v0 lc 31225039
>> 2014-09-17 16:55:10.606136 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
>> updating c 31224401..31225039) is_readable now=2014-09-17 16:55:10.606137
>> lease_expire=0.000000 has v0 lc 31225039
>> 2014-09-17 16:55:10.633460 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).log
>> v12645471 check_sub sending message to client.2190462 10.1.1.10:0/1004032
>> with 1 entries (version 12645471)
>> 2014-09-17 16:55:10.633632 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
>> active c 31224401..31225040) is_readable now=2014-09-17 16:55:10.633633
>> lease_expire=2014-09-17 16:55:15.607320 has v0 lc 31225040
>> 2014-09-17 16:55:10.633646 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
>> active c 31224401..31225040) is_readable now=2014-09-17 16:55:10.633651
>> lease_expire=2014-09-17 16:55:15.607320 has v0 lc 31225040
>> 2014-09-17 16:55:10.633657 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
>> active c 31224401..31225040) is_readable now=2014-09-17 16:55:10.633658
>> lease_expire=2014-09-17 16:55:15.607320 has v0 lc 31225040
>> 2014-09-17 16:55:10.633699 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
>> active c 31224401..31225040) is_readable now=2014-09-17 16:55:10.633700
>> lease_expire=2014-09-17 16:55:15.607320 has v0 lc 31225040
>> 2014-09-17 16:55:10.633707 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
>> active c 31224401..31225040) is_readable now=2014-09-17 16:55:10.633707
>> lease_expire=2014-09-17 16:55:15.607320 has v0 lc 31225040
>> 2014-09-17 16:55:10.695127 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
>> active c 31224401..31225040) is_readable now=2014-09-17 16:55:10.695129
>> lease_expire=2014-09-17 16:55:15.607320 has v0 lc 31225040
>> 2014-09-17 16:55:10.695151 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
>> active c 31224401..31225040) is_readable now=2014-09-17 16:55:10.695152
>> lease_expire=2014-09-17 16:55:15.607320 has v0 lc 31225040
>> 2014-09-17 16:55:10.800013 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
>> active c 31224401..31225040) is_readable now=2014-09-17 16:55:10.800015
>> lease_expire=2014-09-17 16:55:15.607320 has v0 lc 31225040
>> 2014-09-17 16:55:10.800030 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
>> active c 31224401..31225040) is_readable now=2014-09-17 16:55:10.800031
>> lease_expire=2014-09-17 16:55:15.607320 has v0 lc 31225040
>> 2014-09-17 16:55:10.830432 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
>> active c 31224401..31225040) is_readable now=2014-09-17 16:55:10.830433
>> lease_expire=2014-09-17 16:55:15.607320 has v0 lc 31225040
>> 2014-09-17 16:55:10.830441 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
>> active c 31224401..31225040) is_readable now=2014-09-17 16:55:10.830442
>> lease_expire=2014-09-17 16:55:15.607320 has v0 lc 31225040
>> 2014-09-17 16:55:10.848954 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
>> active c 31224401..31225040) is_readable now=2014-09-17 16:55:10.848956
>> lease_expire=2014-09-17 16:55:15.607320 has v0 lc 31225040
>> 2014-09-17 16:55:10.848964 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
>> active c 31224401..31225040) is_readable now=2014-09-17 16:55:10.848965
>> lease_expire=2014-09-17 16:55:15.607320 has v0 lc 31225040
>> 2014-09-17 16:55:10.887139 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
>> active c 31224401..31225040) is_readable now=2014-09-17 16:55:10.887140
>> lease_expire=2014-09-17 16:55:15.607320 has v0 lc 31225040
>> 2014-09-17 16:55:10.887150 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
>> active c 31224401..31225040) is_readable now=2014-09-17 16:55:10.887151
>> lease_expire=2014-09-17 16:55:15.607320 has v0 lc 31225040
>> 2014-09-17 16:55:10.913825 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
>> active c 31224401..31225040) is_readable now=2014-09-17 16:55:10.913827
>> lease_expire=2014-09-17 16:55:15.607320 has v0 lc 31225040
>> 2014-09-17 16:55:10.913834 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
>> active c 31224401..31225040) is_readable now=2014-09-17 16:55:10.913835
>> lease_expire=2014-09-17 16:55:15.607320 has v0 lc 31225040
>> 2014-09-17 16:55:11.010277 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
>> active c 31224401..31225040) is_readable now=2014-09-17 16:55:11.010279
>> lease_expire=2014-09-17 16:55:15.607320 has v0 lc 31225040
>> 2014-09-17 16:55:11.010287 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
>> active c 31224401..31225040) is_readable now=2014-09-17 16:55:11.010288
>> lease_expire=2014-09-17 16:55:15.607320 has v0 lc 31225040
>> 2014-09-17 16:55:11.098312 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
>> active c 31224401..31225040) is_readable now=2014-09-17 16:55:11.098314
>> lease_expire=2014-09-17 16:55:15.607320 has v0 lc 31225040
>> 2014-09-17 16:55:11.098325 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
>> active c 31224401..31225040) is_readable now=2014-09-17 16:55:11.098326
>> lease_expire=2014-09-17 16:55:15.607320 has v0 lc 31225040
>> 2014-09-17 16:55:11.109040 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
>> active c 31224401..31225040) is_readable now=2014-09-17 16:55:11.109042
>> lease_expire=2014-09-17 16:55:15.607320 has v0 lc 31225040
>> 2014-09-17 16:55:11.109053 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
>> active c 31224401..31225040) is_readable now=2014-09-17 16:55:11.109054
>> lease_expire=2014-09-17 16:55:15.607320 has v0 lc 31225040
>> 2014-09-17 16:55:11.170705 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
>> active c 31224401..31225040) is_readable now=2014-09-17 16:55:11.170706
>> lease_expire=2014-09-17 16:55:15.607320 has v0 lc 31225040
>> 2014-09-17 16:55:11.170713 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
>> active c 31224401..31225040) is_readable now=2014-09-17 16:55:11.170714
>> lease_expire=2014-09-17 16:55:15.607320 has v0 lc 31225040
>> 2014-09-17 16:55:11.222537 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
>> active c 31224401..31225040) is_readable now=2014-09-17 16:55:11.222539
>> lease_expire=2014-09-17 16:55:15.607320 has v0 lc 31225040
>> 2014-09-17 16:55:11.222549 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
>> active c 31224401..31225040) is_readable now=2014-09-17 16:55:11.222550
>> lease_expire=2014-09-17 16:55:15.607320 has v0 lc 31225040
>> 2014-09-17 16:55:11.431510 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
>> active c 31224401..31225040) is_readable now=2014-09-17 16:55:11.431511
>> lease_expire=2014-09-17 16:55:15.607320 has v0 lc 31225040
>> 2014-09-17 16:55:11.431524 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
>> active c 31224401..31225040) is_readable now=2014-09-17 16:55:11.431525
>> lease_expire=2014-09-17 16:55:15.607320 has v0 lc 31225040
>> 2014-09-17 16:55:11.453664 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
>> active c 31224401..31225040) is_readable now=2014-09-17 16:55:11.453666
>> lease_expire=2014-09-17 16:55:15.607320 has v0 lc 31225040
>> 2014-09-17 16:55:11.453685 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
>> active c 31224401..31225040) is_readable now=2014-09-17 16:55:11.453687
>> lease_expire=2014-09-17 16:55:15.607320 has v0 lc 31225040
>> 2014-09-17 16:55:11.520250 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
>> active c 31224401..31225040) is_readable now=2014-09-17 16:55:11.520252
>> lease_expire=2014-09-17 16:55:15.607320 has v0 lc 31225040
>> 2014-09-17 16:55:11.520263 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
>> active c 31224401..31225040) is_readable now=2014-09-17 16:55:11.520264
>> lease_expire=2014-09-17 16:55:15.607320 has v0 lc 31225040
>> 2014-09-17 16:55:11.603991 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
>> active c 31224401..31225041) is_readable now=2014-09-17 16:55:11.603992
>> lease_expire=2014-09-17 16:55:16.575181 has v0 lc 31225041
>> 2014-09-17 16:55:11.610948 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
>> active c 31224401..31225041) is_readable now=2014-09-17 16:55:11.610949
>> lease_expire=2014-09-17 16:55:16.575181 has v0 lc 31225041
>> 2014-09-17 16:55:11.610965 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
>> active c 31224401..31225041) is_readable now=2014-09-17 16:55:11.610966
>> lease_expire=2014-09-17 16:55:16.575181 has v0 lc 31225041
>> 2014-09-17 16:55:11.622479 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
>> active c 31224401..31225041) is_readable now=2014-09-17 16:55:11.622480
>> lease_expire=2014-09-17 16:55:16.575181 has v0 lc 31225041
>> 2014-09-17 16:55:11.622495 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
>> active c 31224401..31225041) is_readable now=2014-09-17 16:55:11.622496
>> lease_expire=2014-09-17 16:55:16.575181 has v0 lc 31225041
>> 2014-09-17 16:55:11.787013 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
>> active c 31224401..31225041) is_readable now=2014-09-17 16:55:11.787014
>> lease_expire=2014-09-17 16:55:16.575181 has v0 lc 31225041
>> 2014-09-17 16:55:11.787024 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
>> active c 31224401..31225041) is_readable now=2014-09-17 16:55:11.787025
>> lease_expire=2014-09-17 16:55:16.575181 has v0 lc 31225041
>> 2014-09-17 16:55:11.873613 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
>> active c 31224401..31225041) is_readable now=2014-09-17 16:55:11.873614
>> lease_expire=2014-09-17 16:55:16.575181 has v0 lc 31225041
>> 2014-09-17 16:55:11.873627 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
>> active c 31224401..31225041) is_readable now=2014-09-17 16:55:11.873628
>> lease_expire=2014-09-17 16:55:16.575181 has v0 lc 31225041
>> 2014-09-17 16:55:11.988465 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
>> active c 31224401..31225041) is_readable now=2014-09-17 16:55:11.988467
>> lease_expire=2014-09-17 16:55:16.575181 has v0 lc 31225041
>> 2014-09-17 16:55:11.988487 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
>> active c 31224401..31225041) is_readable now=2014-09-17 16:55:11.988489
>> lease_expire=2014-09-17 16:55:16.575181 has v0 lc 31225041
>>
>>
>> I'm wondering at this point whether I should just reinject the monmap
>> from mon01 or mon02 into mon03 or whether there is something else that can
>> be done to fix this.
>>
>> With hindsight, I would have stopped the mon service before relocating
>> the nic cable, but I expected the mon to survive a short network outage
>> which it doesn't seem to have done :(
>>
>>
>> On 17 September 2014 16:21, James Eckersall <james.eckersall at gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> Thanks for the advice.
>>>
>>> I feel pretty dumb as it does indeed look like a simple networking
>>> issue.  You know how you check things 5 times and miss the most obvious
>>> one...
>>>
>>> J
>>>
>>> On 17 September 2014 16:04, Florian Haas <florian at hastexo.com> wrote:
>>>
>>>> On Wed, Sep 17, 2014 at 1:58 PM, James Eckersall
>>>> <james.eckersall at gmail.com> wrote:
>>>> > Hi,
>>>> >
>>>> > I have a ceph cluster running 0.80.1 on Ubuntu 14.04.  I have 3
>>>> monitors and
>>>> > 4 OSD nodes currently.
>>>> >
>>>> > Everything has been running great up until today where I've got an
>>>> issue
>>>> > with the monitors.
>>>> > I moved mon03 to a different switchport so it would have temporarily
>>>> lost
>>>> > connectivity.
>>>> > Since then, the cluster is reporting that that mon is down, although
>>>> it's
>>>> > definitely up.
>>>> > I've tried restarting the mon services on all three mons, but that
>>>> hasn't
>>>> > made a difference.
>>>> > I definitely, 100% do not have any clock skew on any of the mons.
>>>> This has
>>>> > been triple-checked as the ceph docs seem to suggest that might be
>>>> the cause
>>>> > of this issue.
>>>> >
>>>> > Here is what ceph -s and ceph health detail are reporting as well as
>>>> the
>>>> > mon_status for each monitor:
>>>> >
>>>> >
>>>> > # ceph -s ; ceph health detail
>>>> >     cluster XXX
>>>> >      health HEALTH_WARN 1 mons down, quorum 0,1
>>>> ceph-mon-01,ceph-mon-02
>>>> >      monmap e2: 3 mons at
>>>> > {ceph-mon-01=
>>>> 10.1.1.64:6789/0,ceph-mon-02=10.1.1.65:6789/0,ceph-mon-03=10.1.1.66:6789/0
>>>> },
>>>> > election epoch 932, quorum 0,1 ceph-mon-01,ceph-mon-02
>>>> >      osdmap e49213: 80 osds: 80 up, 80 in
>>>> >       pgmap v18242952: 4864 pgs, 5 pools, 69910 GB data, 17638
>>>> kobjects
>>>> >             197 TB used, 95904 GB / 290 TB avail
>>>> >                    8 active+clean+scrubbing+deep
>>>> >                 4856 active+clean
>>>> >   client io 6893 kB/s rd, 5657 kB/s wr, 2090 op/s
>>>> > HEALTH_WARN 1 mons down, quorum 0,1 ceph-mon-01,ceph-mon-02
>>>> > mon.ceph-mon-03 (rank 2) addr 10.1.1.66:6789/0 is down (out of
>>>> quorum)
>>>> >
>>>> >
>>>> > { "name": "ceph-mon-01",
>>>> >   "rank": 0,
>>>> >   "state": "leader",
>>>> >   "election_epoch": 932,
>>>> >   "quorum": [
>>>> >         0,
>>>> >         1],
>>>> >   "outside_quorum": [],
>>>> >   "extra_probe_peers": [],
>>>> >   "sync_provider": [],
>>>> >   "monmap": { "epoch": 2,
>>>> >       "fsid": "XXX",
>>>> >       "modified": "0.000000",
>>>> >       "created": "0.000000",
>>>> >       "mons": [
>>>> >             { "rank": 0,
>>>> >               "name": "ceph-mon-01",
>>>> >               "addr": "10.1.1.64:6789\/0"},
>>>> >             { "rank": 1,
>>>> >               "name": "ceph-mon-02",
>>>> >               "addr": "10.1.1.65:6789\/0"},
>>>> >             { "rank": 2,
>>>> >               "name": "ceph-mon-03",
>>>> >               "addr": "10.1.1.66:6789\/0"}]}}
>>>> >
>>>> >
>>>> > { "name": "ceph-mon-02",
>>>> >   "rank": 1,
>>>> >   "state": "peon",
>>>> >   "election_epoch": 932,
>>>> >   "quorum": [
>>>> >         0,
>>>> >         1],
>>>> >   "outside_quorum": [],
>>>> >   "extra_probe_peers": [],
>>>> >   "sync_provider": [],
>>>> >   "monmap": { "epoch": 2,
>>>> >       "fsid": "XXX",
>>>> >       "modified": "0.000000",
>>>> >       "created": "0.000000",
>>>> >       "mons": [
>>>> >             { "rank": 0,
>>>> >               "name": "ceph-mon-01",
>>>> >               "addr": "10.1.1.64:6789\/0"},
>>>> >             { "rank": 1,
>>>> >               "name": "ceph-mon-02",
>>>> >               "addr": "10.1.1.65:6789\/0"},
>>>> >             { "rank": 2,
>>>> >               "name": "ceph-mon-03",
>>>> >               "addr": "10.1.1.66:6789\/0"}]}}
>>>> >
>>>> >
>>>> > { "name": "ceph-mon-03",
>>>> >   "rank": 2,
>>>> >   "state": "electing",
>>>> >   "election_epoch": 931,
>>>> >   "quorum": [],
>>>> >   "outside_quorum": [],
>>>> >   "extra_probe_peers": [],
>>>> >   "sync_provider": [],
>>>> >   "monmap": { "epoch": 2,
>>>> >       "fsid": "XXX",
>>>> >       "modified": "0.000000",
>>>> >       "created": "0.000000",
>>>> >       "mons": [
>>>> >             { "rank": 0,
>>>> >               "name": "ceph-mon-01",
>>>> >               "addr": "10.1.1.64:6789\/0"},
>>>> >             { "rank": 1,
>>>> >               "name": "ceph-mon-02",
>>>> >               "addr": "10.1.1.65:6789\/0"},
>>>> >             { "rank": 2,
>>>> >               "name": "ceph-mon-03",
>>>> >               "addr": "10.1.1.66:6789\/0"}]}}
>>>> >
>>>> >
>>>> > Any help or advice is appreciated.
>>>>
>>>> It looks like your mon has been unable to communicate with the other
>>>> hosts, presumably since the time you un-/replugged it. Check your
>>>> switch port configuration. Also, make sure that from 10.1.1.66, you
>>>> can not only ping 10.1.1.64 and 10.1.1.65, but make a TCP connection
>>>> on port 6789. With that out of the way, check your mon log on
>>>> ceph-mon-03 (in /var/log/ceph/mon); it should provide some additional
>>>> insight into the problem.
>>>>
>>>> Cheers,
>>>> Florian
>>>>
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140919/cab64bd4/attachment.htm>


[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux