monitor quorum

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

Now I feel dumb for jumping to the conclusion that it was a simple
networking issue - it isn't.
I've just checked connectivity properly and I can ping and telnet 6789 from
all mon servers to all other mon servers.

I've just restarted the mon03 service and the log is showing the following:

2014-09-17 16:49:02.355148 7f7ef9f8c800  0 starting mon.ceph-mon-03 rank 2
at 10.1.1.66:6789/0 mon_data /var/lib/ceph/mon/ceph-ceph-mon-03 fsid
74069c87-b361-4bb8-8ce8-6ae9deb8a9bd
2014-09-17 16:49:02.355375 7f7ef9f8c800  1 mon.ceph-mon-03 at -1(probing) e2
preinit fsid 74069c87-b361-4bb8-8ce8-6ae9deb8a9bd
2014-09-17 16:49:02.356347 7f7ef9f8c800  1
mon.ceph-mon-03 at -1(probing).paxosservice(pgmap
18241250..18241952) refresh upgraded, format 0 -> 1
2014-09-17 16:49:02.356360 7f7ef9f8c800  1 mon.ceph-mon-03 at -1(probing).pg
v0 on_upgrade discarding in-core PGMap
2014-09-17 16:49:02.400316 7f7ef9f8c800  0 mon.ceph-mon-03 at -1(probing).mds
e1 print_map
epoch 1
flags 0
created 2013-12-09 10:19:58.534310
modified 2013-12-09 10:19:58.534332
tableserver 0
root 0
session_timeout 60
session_autoclose 300
max_file_size 1099511627776
last_failure 0
last_failure_osd_epoch 0
compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable
ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds
uses versioned encoding}
max_mds 1
in
up {}
failed
stopped
data_pools 0
metadata_pool 1
inline_data disabled

2014-09-17 16:49:02.402373 7f7ef9f8c800  0 mon.ceph-mon-03 at -1(probing).osd
e49212 crush map has features 1107558400, adjusting msgr requires
2014-09-17 16:49:02.402384 7f7ef9f8c800  0 mon.ceph-mon-03 at -1(probing).osd
e49212 crush map has features 1107558400, adjusting msgr requires
2014-09-17 16:49:02.402386 7f7ef9f8c800  0 mon.ceph-mon-03 at -1(probing).osd
e49212 crush map has features 1107558400, adjusting msgr requires
2014-09-17 16:49:02.402388 7f7ef9f8c800  0 mon.ceph-mon-03 at -1(probing).osd
e49212 crush map has features 1107558400, adjusting msgr requires
2014-09-17 16:49:02.403725 7f7ef9f8c800  1
mon.ceph-mon-03 at -1(probing).paxosservice(auth
26001..26154) refresh upgraded, format 0 -> 1
2014-09-17 16:49:02.404834 7f7ef9f8c800  0 mon.ceph-mon-03 at -1(probing) e2
 my rank is now 2 (was -1)
2014-09-17 16:49:02.407439 7f7ef331b700  1 mon.ceph-mon-03 at 2(synchronizing)
e2 sync_obtain_latest_monmap
2014-09-17 16:49:02.407588 7f7ef331b700  1 mon.ceph-mon-03 at 2(synchronizing)
e2 sync_obtain_latest_monmap obtained monmap e2
2014-09-17 16:49:09.514365 7f7ef331b700  0 log [INF] : mon.ceph-mon-03
calling new monitor election
2014-09-17 16:49:09.514523 7f7ef331b700  1
mon.ceph-mon-03 at 2(electing).elector(931)
init, last seen epoch 931
2014-09-17 16:49:09.514658 7f7ef331b700  1
mon.ceph-mon-03 at 2(electing).paxos(paxos
recovering c 31223899..31224482) is_readable now=2014-09-17 16:49:09.514659
lease_expire=0.000000 has v0 lc 31224482
2014-09-17 16:49:09.514665 7f7ef331b700  1
mon.ceph-mon-03 at 2(electing).paxos(paxos
recovering c 31223899..31224482) is_readable now=2014-09-17 16:49:09.514666
lease_expire=0.000000 has v0 lc 31224482
2014-09-17 16:49:15.533876 7f7ef3b1c700  1
mon.ceph-mon-03 at 2(electing).elector(933)
init, last seen epoch 933
2014-09-17 16:49:21.578269 7f7ef3b1c700  1
mon.ceph-mon-03 at 2(electing).elector(935)
init, last seen epoch 935
2014-09-17 16:49:26.578526 7f7ef3b1c700  1
mon.ceph-mon-03 at 2(electing).elector(935)
init, last seen epoch 935
2014-09-17 16:49:31.578790 7f7ef3b1c700  1
mon.ceph-mon-03 at 2(electing).elector(935)
init, last seen epoch 935
2014-09-17 16:49:36.579044 7f7ef3b1c700  1
mon.ceph-mon-03 at 2(electing).elector(935)
init, last seen epoch 935


The last lines about "electing" repeat forever.  The other mons are logging
far more entries than I have seen them log before.  They look like the
following (note the timestamps - all of these log lines are from just a 2
second period):

2014-09-17 16:55:10.019407 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
active c 31224401..31225038) is_readable now=2014-09-17 16:55:10.019408
lease_expire=2014-09-17 16:55:14.518716 has v0 lc 31225038
2014-09-17 16:55:10.019418 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
active c 31224401..31225038) is_readable now=2014-09-17 16:55:10.019418
lease_expire=2014-09-17 16:55:14.518716 has v0 lc 31225038
2014-09-17 16:55:10.180220 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
active c 31224401..31225038) is_readable now=2014-09-17 16:55:10.180222
lease_expire=2014-09-17 16:55:14.518716 has v0 lc 31225038
2014-09-17 16:55:10.180233 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
active c 31224401..31225038) is_readable now=2014-09-17 16:55:10.180234
lease_expire=2014-09-17 16:55:14.518716 has v0 lc 31225038
2014-09-17 16:55:10.192668 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
active c 31224401..31225038) is_readable now=2014-09-17 16:55:10.192670
lease_expire=2014-09-17 16:55:14.518716 has v0 lc 31225038
2014-09-17 16:55:10.192691 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
active c 31224401..31225038) is_readable now=2014-09-17 16:55:10.192692
lease_expire=2014-09-17 16:55:14.518716 has v0 lc 31225038
2014-09-17 16:55:10.276726 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
active c 31224401..31225038) is_readable now=2014-09-17 16:55:10.276727
lease_expire=2014-09-17 16:55:14.518716 has v0 lc 31225038
2014-09-17 16:55:10.276737 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
active c 31224401..31225038) is_readable now=2014-09-17 16:55:10.276737
lease_expire=2014-09-17 16:55:14.518716 has v0 lc 31225038
2014-09-17 16:55:10.302638 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
active c 31224401..31225038) is_readable now=2014-09-17 16:55:10.302640
lease_expire=2014-09-17 16:55:14.518716 has v0 lc 31225038
2014-09-17 16:55:10.302651 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
active c 31224401..31225038) is_readable now=2014-09-17 16:55:10.302652
lease_expire=2014-09-17 16:55:14.518716 has v0 lc 31225038
2014-09-17 16:55:10.362642 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
active c 31224401..31225038) is_readable now=2014-09-17 16:55:10.362643
lease_expire=2014-09-17 16:55:14.518716 has v0 lc 31225038
2014-09-17 16:55:10.362655 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
active c 31224401..31225038) is_readable now=2014-09-17 16:55:10.362656
lease_expire=2014-09-17 16:55:14.518716 has v0 lc 31225038
2014-09-17 16:55:10.385686 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
active c 31224401..31225038) is_readable now=2014-09-17 16:55:10.385687
lease_expire=2014-09-17 16:55:14.518716 has v0 lc 31225038
2014-09-17 16:55:10.385697 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
active c 31224401..31225038) is_readable now=2014-09-17 16:55:10.385697
lease_expire=2014-09-17 16:55:14.518716 has v0 lc 31225038
2014-09-17 16:55:10.406712 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
active c 31224401..31225038) is_readable now=2014-09-17 16:55:10.406713
lease_expire=2014-09-17 16:55:14.518716 has v0 lc 31225038
2014-09-17 16:55:10.406723 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
active c 31224401..31225038) is_readable now=2014-09-17 16:55:10.406724
lease_expire=2014-09-17 16:55:14.518716 has v0 lc 31225038
2014-09-17 16:55:10.423277 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
active c 31224401..31225038) is_readable now=2014-09-17 16:55:10.423279
lease_expire=2014-09-17 16:55:14.518716 has v0 lc 31225038
2014-09-17 16:55:10.423299 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
active c 31224401..31225038) is_readable now=2014-09-17 16:55:10.423300
lease_expire=2014-09-17 16:55:14.518716 has v0 lc 31225038
2014-09-17 16:55:10.543138 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
updating c 31224401..31225038) is_readable now=2014-09-17 16:55:10.543139
lease_expire=0.000000 has v0 lc 31225038
2014-09-17 16:55:10.543145 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
updating c 31224401..31225038) is_readable now=2014-09-17 16:55:10.543145
lease_expire=0.000000 has v0 lc 31225038
2014-09-17 16:55:10.580911 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
active c 31224401..31225039) is_readable now=2014-09-17 16:55:10.580912
lease_expire=2014-09-17 16:55:15.549947 has v0 lc 31225039
2014-09-17 16:55:10.580922 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
active c 31224401..31225039) is_readable now=2014-09-17 16:55:10.580923
lease_expire=2014-09-17 16:55:15.549947 has v0 lc 31225039
2014-09-17 16:55:10.580930 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
active c 31224401..31225039) is_readable now=2014-09-17 16:55:10.580930
lease_expire=2014-09-17 16:55:15.549947 has v0 lc 31225039
2014-09-17 16:55:10.606130 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
updating c 31224401..31225039) is_readable now=2014-09-17 16:55:10.606131
lease_expire=0.000000 has v0 lc 31225039
2014-09-17 16:55:10.606136 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
updating c 31224401..31225039) is_readable now=2014-09-17 16:55:10.606137
lease_expire=0.000000 has v0 lc 31225039
2014-09-17 16:55:10.633460 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).log
v12645471 check_sub sending message to client.2190462 10.1.1.10:0/1004032
with 1 entries (version 12645471)
2014-09-17 16:55:10.633632 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
active c 31224401..31225040) is_readable now=2014-09-17 16:55:10.633633
lease_expire=2014-09-17 16:55:15.607320 has v0 lc 31225040
2014-09-17 16:55:10.633646 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
active c 31224401..31225040) is_readable now=2014-09-17 16:55:10.633651
lease_expire=2014-09-17 16:55:15.607320 has v0 lc 31225040
2014-09-17 16:55:10.633657 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
active c 31224401..31225040) is_readable now=2014-09-17 16:55:10.633658
lease_expire=2014-09-17 16:55:15.607320 has v0 lc 31225040
2014-09-17 16:55:10.633699 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
active c 31224401..31225040) is_readable now=2014-09-17 16:55:10.633700
lease_expire=2014-09-17 16:55:15.607320 has v0 lc 31225040
2014-09-17 16:55:10.633707 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
active c 31224401..31225040) is_readable now=2014-09-17 16:55:10.633707
lease_expire=2014-09-17 16:55:15.607320 has v0 lc 31225040
2014-09-17 16:55:10.695127 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
active c 31224401..31225040) is_readable now=2014-09-17 16:55:10.695129
lease_expire=2014-09-17 16:55:15.607320 has v0 lc 31225040
2014-09-17 16:55:10.695151 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
active c 31224401..31225040) is_readable now=2014-09-17 16:55:10.695152
lease_expire=2014-09-17 16:55:15.607320 has v0 lc 31225040
2014-09-17 16:55:10.800013 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
active c 31224401..31225040) is_readable now=2014-09-17 16:55:10.800015
lease_expire=2014-09-17 16:55:15.607320 has v0 lc 31225040
2014-09-17 16:55:10.800030 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
active c 31224401..31225040) is_readable now=2014-09-17 16:55:10.800031
lease_expire=2014-09-17 16:55:15.607320 has v0 lc 31225040
2014-09-17 16:55:10.830432 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
active c 31224401..31225040) is_readable now=2014-09-17 16:55:10.830433
lease_expire=2014-09-17 16:55:15.607320 has v0 lc 31225040
2014-09-17 16:55:10.830441 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
active c 31224401..31225040) is_readable now=2014-09-17 16:55:10.830442
lease_expire=2014-09-17 16:55:15.607320 has v0 lc 31225040
2014-09-17 16:55:10.848954 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
active c 31224401..31225040) is_readable now=2014-09-17 16:55:10.848956
lease_expire=2014-09-17 16:55:15.607320 has v0 lc 31225040
2014-09-17 16:55:10.848964 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
active c 31224401..31225040) is_readable now=2014-09-17 16:55:10.848965
lease_expire=2014-09-17 16:55:15.607320 has v0 lc 31225040
2014-09-17 16:55:10.887139 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
active c 31224401..31225040) is_readable now=2014-09-17 16:55:10.887140
lease_expire=2014-09-17 16:55:15.607320 has v0 lc 31225040
2014-09-17 16:55:10.887150 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
active c 31224401..31225040) is_readable now=2014-09-17 16:55:10.887151
lease_expire=2014-09-17 16:55:15.607320 has v0 lc 31225040
2014-09-17 16:55:10.913825 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
active c 31224401..31225040) is_readable now=2014-09-17 16:55:10.913827
lease_expire=2014-09-17 16:55:15.607320 has v0 lc 31225040
2014-09-17 16:55:10.913834 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
active c 31224401..31225040) is_readable now=2014-09-17 16:55:10.913835
lease_expire=2014-09-17 16:55:15.607320 has v0 lc 31225040
2014-09-17 16:55:11.010277 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
active c 31224401..31225040) is_readable now=2014-09-17 16:55:11.010279
lease_expire=2014-09-17 16:55:15.607320 has v0 lc 31225040
2014-09-17 16:55:11.010287 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
active c 31224401..31225040) is_readable now=2014-09-17 16:55:11.010288
lease_expire=2014-09-17 16:55:15.607320 has v0 lc 31225040
2014-09-17 16:55:11.098312 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
active c 31224401..31225040) is_readable now=2014-09-17 16:55:11.098314
lease_expire=2014-09-17 16:55:15.607320 has v0 lc 31225040
2014-09-17 16:55:11.098325 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
active c 31224401..31225040) is_readable now=2014-09-17 16:55:11.098326
lease_expire=2014-09-17 16:55:15.607320 has v0 lc 31225040
2014-09-17 16:55:11.109040 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
active c 31224401..31225040) is_readable now=2014-09-17 16:55:11.109042
lease_expire=2014-09-17 16:55:15.607320 has v0 lc 31225040
2014-09-17 16:55:11.109053 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
active c 31224401..31225040) is_readable now=2014-09-17 16:55:11.109054
lease_expire=2014-09-17 16:55:15.607320 has v0 lc 31225040
2014-09-17 16:55:11.170705 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
active c 31224401..31225040) is_readable now=2014-09-17 16:55:11.170706
lease_expire=2014-09-17 16:55:15.607320 has v0 lc 31225040
2014-09-17 16:55:11.170713 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
active c 31224401..31225040) is_readable now=2014-09-17 16:55:11.170714
lease_expire=2014-09-17 16:55:15.607320 has v0 lc 31225040
2014-09-17 16:55:11.222537 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
active c 31224401..31225040) is_readable now=2014-09-17 16:55:11.222539
lease_expire=2014-09-17 16:55:15.607320 has v0 lc 31225040
2014-09-17 16:55:11.222549 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
active c 31224401..31225040) is_readable now=2014-09-17 16:55:11.222550
lease_expire=2014-09-17 16:55:15.607320 has v0 lc 31225040
2014-09-17 16:55:11.431510 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
active c 31224401..31225040) is_readable now=2014-09-17 16:55:11.431511
lease_expire=2014-09-17 16:55:15.607320 has v0 lc 31225040
2014-09-17 16:55:11.431524 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
active c 31224401..31225040) is_readable now=2014-09-17 16:55:11.431525
lease_expire=2014-09-17 16:55:15.607320 has v0 lc 31225040
2014-09-17 16:55:11.453664 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
active c 31224401..31225040) is_readable now=2014-09-17 16:55:11.453666
lease_expire=2014-09-17 16:55:15.607320 has v0 lc 31225040
2014-09-17 16:55:11.453685 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
active c 31224401..31225040) is_readable now=2014-09-17 16:55:11.453687
lease_expire=2014-09-17 16:55:15.607320 has v0 lc 31225040
2014-09-17 16:55:11.520250 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
active c 31224401..31225040) is_readable now=2014-09-17 16:55:11.520252
lease_expire=2014-09-17 16:55:15.607320 has v0 lc 31225040
2014-09-17 16:55:11.520263 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
active c 31224401..31225040) is_readable now=2014-09-17 16:55:11.520264
lease_expire=2014-09-17 16:55:15.607320 has v0 lc 31225040
2014-09-17 16:55:11.603991 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
active c 31224401..31225041) is_readable now=2014-09-17 16:55:11.603992
lease_expire=2014-09-17 16:55:16.575181 has v0 lc 31225041
2014-09-17 16:55:11.610948 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
active c 31224401..31225041) is_readable now=2014-09-17 16:55:11.610949
lease_expire=2014-09-17 16:55:16.575181 has v0 lc 31225041
2014-09-17 16:55:11.610965 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
active c 31224401..31225041) is_readable now=2014-09-17 16:55:11.610966
lease_expire=2014-09-17 16:55:16.575181 has v0 lc 31225041
2014-09-17 16:55:11.622479 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
active c 31224401..31225041) is_readable now=2014-09-17 16:55:11.622480
lease_expire=2014-09-17 16:55:16.575181 has v0 lc 31225041
2014-09-17 16:55:11.622495 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
active c 31224401..31225041) is_readable now=2014-09-17 16:55:11.622496
lease_expire=2014-09-17 16:55:16.575181 has v0 lc 31225041
2014-09-17 16:55:11.787013 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
active c 31224401..31225041) is_readable now=2014-09-17 16:55:11.787014
lease_expire=2014-09-17 16:55:16.575181 has v0 lc 31225041
2014-09-17 16:55:11.787024 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
active c 31224401..31225041) is_readable now=2014-09-17 16:55:11.787025
lease_expire=2014-09-17 16:55:16.575181 has v0 lc 31225041
2014-09-17 16:55:11.873613 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
active c 31224401..31225041) is_readable now=2014-09-17 16:55:11.873614
lease_expire=2014-09-17 16:55:16.575181 has v0 lc 31225041
2014-09-17 16:55:11.873627 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
active c 31224401..31225041) is_readable now=2014-09-17 16:55:11.873628
lease_expire=2014-09-17 16:55:16.575181 has v0 lc 31225041
2014-09-17 16:55:11.988465 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
active c 31224401..31225041) is_readable now=2014-09-17 16:55:11.988467
lease_expire=2014-09-17 16:55:16.575181 has v0 lc 31225041
2014-09-17 16:55:11.988487 7fd5a479a700  1 mon.ceph-mon-02 at 1(peon).paxos(paxos
active c 31224401..31225041) is_readable now=2014-09-17 16:55:11.988489
lease_expire=2014-09-17 16:55:16.575181 has v0 lc 31225041


I'm wondering at this point whether I should just reinject the monmap from
mon01 or mon02 into mon03 or whether there is something else that can be
done to fix this.

With hindsight, I would have stopped the mon service before relocating the
nic cable, but I expected the mon to survive a short network outage which
it doesn't seem to have done :(


On 17 September 2014 16:21, James Eckersall <james.eckersall at gmail.com>
wrote:

> Hi,
>
> Thanks for the advice.
>
> I feel pretty dumb as it does indeed look like a simple networking issue.
>  You know how you check things 5 times and miss the most obvious one...
>
> J
>
> On 17 September 2014 16:04, Florian Haas <florian at hastexo.com> wrote:
>
>> On Wed, Sep 17, 2014 at 1:58 PM, James Eckersall
>> <james.eckersall at gmail.com> wrote:
>> > Hi,
>> >
>> > I have a ceph cluster running 0.80.1 on Ubuntu 14.04.  I have 3
>> monitors and
>> > 4 OSD nodes currently.
>> >
>> > Everything has been running great up until today where I've got an issue
>> > with the monitors.
>> > I moved mon03 to a different switchport so it would have temporarily
>> lost
>> > connectivity.
>> > Since then, the cluster is reporting that that mon is down, although
>> it's
>> > definitely up.
>> > I've tried restarting the mon services on all three mons, but that
>> hasn't
>> > made a difference.
>> > I definitely, 100% do not have any clock skew on any of the mons.  This
>> has
>> > been triple-checked as the ceph docs seem to suggest that might be the
>> cause
>> > of this issue.
>> >
>> > Here is what ceph -s and ceph health detail are reporting as well as the
>> > mon_status for each monitor:
>> >
>> >
>> > # ceph -s ; ceph health detail
>> >     cluster XXX
>> >      health HEALTH_WARN 1 mons down, quorum 0,1 ceph-mon-01,ceph-mon-02
>> >      monmap e2: 3 mons at
>> > {ceph-mon-01=
>> 10.1.1.64:6789/0,ceph-mon-02=10.1.1.65:6789/0,ceph-mon-03=10.1.1.66:6789/0
>> },
>> > election epoch 932, quorum 0,1 ceph-mon-01,ceph-mon-02
>> >      osdmap e49213: 80 osds: 80 up, 80 in
>> >       pgmap v18242952: 4864 pgs, 5 pools, 69910 GB data, 17638 kobjects
>> >             197 TB used, 95904 GB / 290 TB avail
>> >                    8 active+clean+scrubbing+deep
>> >                 4856 active+clean
>> >   client io 6893 kB/s rd, 5657 kB/s wr, 2090 op/s
>> > HEALTH_WARN 1 mons down, quorum 0,1 ceph-mon-01,ceph-mon-02
>> > mon.ceph-mon-03 (rank 2) addr 10.1.1.66:6789/0 is down (out of quorum)
>> >
>> >
>> > { "name": "ceph-mon-01",
>> >   "rank": 0,
>> >   "state": "leader",
>> >   "election_epoch": 932,
>> >   "quorum": [
>> >         0,
>> >         1],
>> >   "outside_quorum": [],
>> >   "extra_probe_peers": [],
>> >   "sync_provider": [],
>> >   "monmap": { "epoch": 2,
>> >       "fsid": "XXX",
>> >       "modified": "0.000000",
>> >       "created": "0.000000",
>> >       "mons": [
>> >             { "rank": 0,
>> >               "name": "ceph-mon-01",
>> >               "addr": "10.1.1.64:6789\/0"},
>> >             { "rank": 1,
>> >               "name": "ceph-mon-02",
>> >               "addr": "10.1.1.65:6789\/0"},
>> >             { "rank": 2,
>> >               "name": "ceph-mon-03",
>> >               "addr": "10.1.1.66:6789\/0"}]}}
>> >
>> >
>> > { "name": "ceph-mon-02",
>> >   "rank": 1,
>> >   "state": "peon",
>> >   "election_epoch": 932,
>> >   "quorum": [
>> >         0,
>> >         1],
>> >   "outside_quorum": [],
>> >   "extra_probe_peers": [],
>> >   "sync_provider": [],
>> >   "monmap": { "epoch": 2,
>> >       "fsid": "XXX",
>> >       "modified": "0.000000",
>> >       "created": "0.000000",
>> >       "mons": [
>> >             { "rank": 0,
>> >               "name": "ceph-mon-01",
>> >               "addr": "10.1.1.64:6789\/0"},
>> >             { "rank": 1,
>> >               "name": "ceph-mon-02",
>> >               "addr": "10.1.1.65:6789\/0"},
>> >             { "rank": 2,
>> >               "name": "ceph-mon-03",
>> >               "addr": "10.1.1.66:6789\/0"}]}}
>> >
>> >
>> > { "name": "ceph-mon-03",
>> >   "rank": 2,
>> >   "state": "electing",
>> >   "election_epoch": 931,
>> >   "quorum": [],
>> >   "outside_quorum": [],
>> >   "extra_probe_peers": [],
>> >   "sync_provider": [],
>> >   "monmap": { "epoch": 2,
>> >       "fsid": "XXX",
>> >       "modified": "0.000000",
>> >       "created": "0.000000",
>> >       "mons": [
>> >             { "rank": 0,
>> >               "name": "ceph-mon-01",
>> >               "addr": "10.1.1.64:6789\/0"},
>> >             { "rank": 1,
>> >               "name": "ceph-mon-02",
>> >               "addr": "10.1.1.65:6789\/0"},
>> >             { "rank": 2,
>> >               "name": "ceph-mon-03",
>> >               "addr": "10.1.1.66:6789\/0"}]}}
>> >
>> >
>> > Any help or advice is appreciated.
>>
>> It looks like your mon has been unable to communicate with the other
>> hosts, presumably since the time you un-/replugged it. Check your
>> switch port configuration. Also, make sure that from 10.1.1.66, you
>> can not only ping 10.1.1.64 and 10.1.1.65, but make a TCP connection
>> on port 6789. With that out of the way, check your mon log on
>> ceph-mon-03 (in /var/log/ceph/mon); it should provide some additional
>> insight into the problem.
>>
>> Cheers,
>> Florian
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140917/d5242080/attachment-0001.htm>


[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux