Help! 61.1 killed my monitors in prod

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



After upgrading my cluster everything looked good, then I rebooted the farm and all hell broke loose.

 

I have 3 monitors  but none are able to start. On all of them the '/usr/bin/python /usr/sbin/ceph-create-keys' command is hanging because none of the nodes can accept quorum.

 

 

All ceph tools are producing the following fault:

# ceph -w

2013-05-10 15:00:55.259382 7f6b68e0e700  0 -- :/20337 >> 10.1.1.21:6789/0 pipe(0x2fdc520 sd=4 :0 s=1 pgs=0 cs=0 l=1).fault

….

 

 

Using mommaptool I removed all but one monitor and did the same to ceph.conf and tried running interactively and get the following:

 

Heres the mom output

# /usr/bin/ceph-mon -i a --pid-file /var/run/ceph/mon.a.pid -c /etc/ceph/ceph.conf  -d

2013-05-10 14:54:23.405324 7f0750a61780  0 ceph version 0.61 (237f3f1e8d8c3b85666529860285dcdffdeda4c5), process ceph-mon, pid 29289

starting mon.a rank 0 at 10.1.1.21:6789/0 mon_data /var/lib/ceph/mon/ceph-a fsid 969f28c3-5ee1-4451-9b5b-97c52b724a06

2013-05-10 14:54:23.455975 7f0750a61780  1 mon.a@-1(probing) e1 preinit fsid 969f28c3-5ee1-4451-9b5b-97c52b724a06

2013-05-10 14:54:23.820160 7f0750a61780  1 mon.a@-1(probing).osd e6666 e6666: 96 osds: 96 up, 96 in

2013-05-10 14:54:23.820372 7f0750a61780  1 mon.a@-1(probing).osd e6667 e6667: 96 osds: 96 up, 96 in

2013-05-10 14:54:23.820618 7f0750a61780  1 mon.a@-1(probing).osd e6668 e6668: 96 osds: 96 up, 96 in

2013-05-10 14:54:23.820802 7f0750a61780  1 mon.a@-1(probing).osd e6669 e6669: 96 osds: 96 up, 96 in

2013-05-10 14:54:23.820995 7f0750a61780  1 mon.a@-1(probing).osd e6670 e6670: 96 osds: 96 up, 96 in

2013-05-10 14:54:23.821180 7f0750a61780  1 mon.a@-1(probing).osd e6671 e6671: 96 osds: 96 up, 96 in

2013-05-10 14:54:23.821368 7f0750a61780  1 mon.a@-1(probing).osd e6672 e6672: 96 osds: 96 up, 96 in

2013-05-10 14:54:23.821549 7f0750a61780  1 mon.a@-1(probing).osd e6673 e6673: 96 osds: 96 up, 96 in

2013-05-10 14:54:23.821735 7f0750a61780  1 mon.a@-1(probing).osd e6674 e6674: 96 osds: 96 up, 96 in

2013-05-10 14:54:23.821981 7f0750a61780  1 mon.a@-1(probing).osd e6675 e6675: 96 osds: 96 up, 96 in

2013-05-10 14:54:23.822173 7f0750a61780  1 mon.a@-1(probing).osd e6676 e6676: 96 osds: 96 up, 96 in

2013-05-10 14:54:23.822353 7f0750a61780  1 mon.a@-1(probing).osd e6677 e6677: 96 osds: 96 up, 96 in

2013-05-10 14:54:23.822529 7f0750a61780  1 mon.a@-1(probing).osd e6678 e6678: 96 osds: 96 up, 96 in

2013-05-10 14:54:23.822698 7f0750a61780  1 mon.a@-1(probing).osd e6679 e6679: 96 osds: 96 up, 96 in

2013-05-10 14:54:23.822879 7f0750a61780  1 mon.a@-1(probing).osd e6680 e6680: 96 osds: 96 up, 96 in

2013-05-10 14:54:23.823056 7f0750a61780  1 mon.a@-1(probing).osd e6681 e6681: 96 osds: 96 up, 96 in

2013-05-10 14:54:23.823229 7f0750a61780  1 mon.a@-1(probing).osd e6682 e6682: 96 osds: 96 up, 96 in

2013-05-10 14:54:23.823403 7f0750a61780  1 mon.a@-1(probing).osd e6683 e6683: 96 osds: 96 up, 96 in

2013-05-10 14:54:23.823580 7f0750a61780  1 mon.a@-1(probing).osd e6684 e6684: 96 osds: 96 up, 96 in

2013-05-10 14:54:23.823749 7f0750a61780  1 mon.a@-1(probing).osd e6685 e6685: 96 osds: 96 up, 96 in

2013-05-10 14:54:23.823915 7f0750a61780  1 mon.a@-1(probing).osd e6686 e6686: 96 osds: 92 up, 96 in

2013-05-10 14:54:23.824088 7f0750a61780  1 mon.a@-1(probing).osd e6687 e6687: 96 osds: 88 up, 96 in

2013-05-10 14:54:23.824261 7f0750a61780  1 mon.a@-1(probing).osd e6688 e6688: 96 osds: 83 up, 96 in

2013-05-10 14:54:23.824434 7f0750a61780  1 mon.a@-1(probing).osd e6689 e6689: 96 osds: 71 up, 96 in

2013-05-10 14:54:23.824610 7f0750a61780  1 mon.a@-1(probing).osd e6690 e6690: 96 osds: 69 up, 96 in

2013-05-10 14:54:23.824793 7f0750a61780  1 mon.a@-1(probing).osd e6691 e6691: 96 osds: 56 up, 96 in

2013-05-10 14:54:23.838611 7f0750a61780  0 mon.a@-1(probing).osd e6691 crush map has features 33816576, adjusting msgr requires

2013-05-10 14:54:23.838630 7f0750a61780  0 mon.a@-1(probing).osd e6691 crush map has features 33816576, adjusting msgr requires

2013-05-10 14:54:23.838634 7f0750a61780  0 mon.a@-1(probing).osd e6691 crush map has features 33816576, adjusting msgr requires

2013-05-10 14:54:23.838636 7f0750a61780  0 mon.a@-1(probing).osd e6691 crush map has features 33816576, adjusting msgr requires

2013-05-10 14:54:23.841335 7f0750a61780  0 mon.a@-1(probing) e1  my rank is now 0 (was -1)

2013-05-10 14:54:23.842481 7f0748ff9700  0 -- 10.1.1.21:6789/0 >> 10.1.1.33:6789/0 pipe(0x204ba00 sd=41 :0 s=1 pgs=0 cs=0 l=0).fault

2013-05-10 14:54:23.842493 7f07490fa700  0 -- 10.1.1.21:6789/0 >> 10.1.1.22:6789/0 pipe(0x204bc80 sd=40 :0 s=1 pgs=0 cs=0 l=0).fault

2013-05-10 14:54:28.841438 7f074aaff700  1 mon.a@0(probing) e1 discarding message auth(proto 0 27 bytes epoch 1) v1 and sending client elsewhere

2013-05-10 14:54:28.841472 7f074aaff700  1 mon.a@0(probing) e1 discarding message auth(proto 0 27 bytes epoch 1) v1 and sending client elsewhere

2013-05-10 14:54:28.841483 7f074aaff700  1 mon.a@0(probing) e1 discarding message auth(proto 0 30 bytes epoch 1) v1 and sending client elsewhere

2013-05-10 14:54:28.841491 7f074aaff700  1 mon.a@0(probing) e1 discarding message auth(proto 0 27 bytes epoch 1) v1 and sending client elsewhere

2013-05-10 14:54:28.841499 7f074aaff700  1 mon.a@0(probing) e1 discarding message auth(proto 0 27 bytes epoch 1) v1 and sending client elsewhere

2013-05-10 14:54:28.841507 7f074aaff700  1 mon.a@0(probing) e1 discarding message auth(proto 0 27 bytes epoch 1) v1 and sending client elsewhere

2013-05-10 14:54:28.841515 7f074aaff700  1 mon.a@0(probing) e1 discarding message auth(proto 0 27 bytes epoch 1) v1 and sending client elsewhere

2013-05-10 14:54:28.841526 7f074aaff700  1 mon.a@0(probing) e1 discarding message auth(proto 0 27 bytes epoch 1) v1 and sending client elsewhere

2013-05-10 14:54:28.841540 7f074aaff700  1 mon.a@0(probing) e1 discarding message auth(proto 0 27 bytes epoch 1) v1 and sending client elsewhere

2013-05-10 14:54:28.841549 7f074aaff700  1 mon.a@0(probing) e1 discarding message auth(proto 0 27 bytes epoch 1) v1 and sending client elsewhere

2013-05-10 14:54:28.841556 7f074aaff700  1 mon.a@0(probing) e1 discarding message auth(proto 0 48 bytes epoch 1) v1 and sending client elsewhere

2013-05-10 14:54:28.841567 7f074aaff700  1 mon.a@0(probing) e1 discarding message auth(proto 0 27 bytes epoch 1) v1 and sending client elsewhere

2013-05-10 14:54:28.841578 7f074aaff700  1 mon.a@0(probing) e1 discarding message auth(proto 0 27 bytes epoch 1) v1 and sending client elsewhere

2013-05-10 14:54:28.841585 7f074aaff700  1 mon.a@0(probing) e1 discarding message auth(proto 0 27 bytes epoch 1) v1 and sending client elsewhere

….

 

 

Nelson Jeppesen

 

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux