Re: Help! 61.1 killed my monitors in prod

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On May 10, 2013, at 3:39 PM, Joao Eduardo Luis <joao.luis@xxxxxxxxxxx> wrote:

> We would certainly be interested in taking a look at logs from those monitors, and would appreciate if you could set 'debug mon = 20', 'debug auth = 10' and 'debug ms = 1', and give them a spin until you hit your issue.
> 

I seeing the same problem at Jeppesen.  I running 0.61.1 with 3 MON, 4 OSD and 1 MDS and a reboot of the cluster falls in the same state with hung ceph-create-keys and the monitors not running.  I add the debug setting as indicated.  This is a excerpt from of the output of "ceph status

"2013-05-13 12:37:21.249265 7f8b428a6780  1 -- :/0 messenger.start
2013-05-13 12:37:21.249500 7f8b428a6780  5 adding auth protocol: cephx
2013-05-13 12:37:21.249807 7f8b428a6780  2 auth: KeyRing::load: loaded key file /etc/ceph/ceph.client.admin.keyring
2013-05-13 12:37:21.250031 7f8b428a6780  1 -- :/12649 --> 192.168.139.4:6789/0 -- auth(proto 0 30 bytes epoch 0) v1 -- ?+0 0x2ae5b60 con 0x2ae57c0
2013-05-13 12:37:21.250219 7f8b428a4700  0 -- :/12649 >> 192.168.139.4:6789/0 pipe(0x2ae5560 sd=7 :0 s=1 pgs=0 cs=0 l=1).fault
2013-05-13 12:37:24.249964 7f8b3d918700  1 -- :/12649 mark_down 0x2ae57c0 -- 0x2ae5560
2013-05-13 12:37:24.250150 7f8b3d918700  1 -- :/12649 --> 192.168.139.3:6789/0 -- auth(proto 0 30 bytes epoch 0) v1 -- ?+0 0x7f8b34001350 con 0x7f8b34000e60
2013-05-13 12:37:24.250409 7f8b3c115700  0 -- :/12649 >> 192.168.139.3:6789/0 pipe(0x7f8b34000c00 sd=8 :0 s=1 pgs=0 cs=0 l=1).fault
2013-05-13 12:37:27.250277 7f8b3d918700  1 -- :/12649 mark_down 0x7f8b34000e60 -- 0x7f8b34000c00
2013-05-13 12:37:27.250374 7f8b3d918700  1 -- :/12649 --> 192.168.139.4:6789/0 -- auth(proto 0 30 bytes epoch 0) v1 -- ?+0 0x7f8b34003440 con 0x7f8b34003270
2013-05-13 12:37:27.250607 7f8b428a4700  0 -- :/12649 >> 192.168.139.4:6789/0 pipe(0x7f8b34003010 sd=8 :0 s=1 pgs=0 cs=0 l=1).fault
2013-05-13 12:37:30.250523 7f8b3d918700  1 -- :/12649 mark_down 0x7f8b34003270 -- 0x7f8b34003010
2013-05-13 12:37:30.250619 7f8b3d918700  1 -- :/12649 --> 192.168.139.2:6789/0 -- auth(proto 0 30 bytes epoch 0) v1 -- ?+0 0x7f8b34003dc0 con 0x7f8b34003b20
2013-05-13 12:37:30.251151 7f8b3c115700  1 -- 192.168.139.254:0/12649 learned my addr 192.168.139.254:0/12649
2013-05-13 12:37:33.250733 7f8b3d918700  1 -- 192.168.139.254:0/12649 mark_down 0x7f8b34003b20 -- 0x7f8b340038c0
2013-05-13 12:37:33.250885 7f8b3d918700  1 -- 192.168.139.254:0/12649 --> 192.168.139.3:6789/0 -- auth(proto 0 30 bytes epoch 0) v1 -- ?+0 0x7f8b34002920 con 0x7f8b340025c0
2013-05-13 12:37:33.251081 7f8b2ffff700  0 -- 192.168.139.254:0/12649 >> 192.168.139.3:6789/0 pipe(0x7f8b34002360 sd=7 :0 s=1 pgs=0 cs=0 l=1).fault
2013-05-13 12:37:36.251046 7f8b3d918700  1 -- 192.168.139.254:0/12649 mark_down 0x7f8b340025c0 -- 0x7f8b34002360
2013-05-13 12:37:36.251133 7f8b3d918700  1 -- 192.168.139.254:0/12649 --> 192.168.139.4:6789/0 -- auth(proto 0 30 bytes epoch 0) v1 -- ?+0 0x7f8b34005010 con 0x7f8b340030d0
2013-05-13 12:37:36.251376 7f8b428a4700  0 -- 192.168.139.254:0/12649 >> 192.168.139.4:6789/0 pipe(0x7f8b34002e70 sd=8 :0 s=1 pgs=0 cs=0 l=1).fault
2013-05-13 12:37:39.251250 7f8b3d918700  1 -- 192.168.139.254:0/12649 mark_down 0x7f8b340030d0 -- 0x7f8b34002e70
2013-05-13 12:37:39.251347 7f8b3d918700  1 -- 192.168.139.254:0/12649 --> 192.168.139.2:6789/0 -- auth(proto 0 30 bytes epoch 0) v1 -- ?+0 0x7f8b34005720 con 0x7f8b34005480
2013-05-13 12:37:42.251493 7f8b3d918700  1 -- 192.168.139.254:0/12649 mark_down 0x7f8b34005480 -- 0x7f8b34005220
2013-05-13 12:37:42.251614 7f8b3d918700  1 -- 192.168.139.254:0/12649 --> 192.168.139.4:6789/0 -- auth(proto 0 30 bytes epoch 0) v1 -- ?+0 0x7f8b340047c0 con 0x7f8b34004520
2013-05-13 12:37:42.251800 7f8b3c115700  0 -- 192.168.139.254:0/12649 >> 192.168.139.4:6789/0 pipe(0x7f8b340042c0 sd=7 :0 s=1 pgs=0 cs=0 l=1).fault
2013-05-13 12:37:45.251683 7f8b3d918700  1 -- 192.168.139.254:0/12649 mark_down 0x7f8b34004520 -- 0x7f8b340042c0
2013-05-13 12:37:45.251777 7f8b3d918700  1 -- 192.168.139.254:0/12649 --> 192.168.139.2:6789/0 -- auth(proto 0 30 bytes epoch 0) v1 -- ?+0 0x7f8b34004c40 con 0x7f8b340049d0
2013-05-13 12:37:48.251928 7f8b3d918700  1 -- 192.168.139.254:0/12649 mark_down 0x7f8b340049d0 -- 0x7f8b34005d30
2013-05-13 12:37:48.252058 7f8b3d918700  1 -- 192.168.139.254:0/12649 --> 192.168.139.3:6789/0 -- auth(proto 0 30 bytes epoch 0) v1 -- ?+0 0x7f8b340052e0 con 0x7f8b34005040
2013-05-13 12:37:48.252252 7f8b2ffff700  0 -- 192.168.139.254:0/12649 >> 192.168.139.3:6789/0 pipe(0x7f8b34004de0 sd=7 :0 s=1 pgs=0 cs=0 l=1).fault
2013-05-13 12:37:51.252149 7f8b3d918700  1 -- 192.168.139.254:0/12649 mark_down 0x7f8b34005040 -- 0x7f8b34004de0
2013-05-13 12:37:51.252236 7f8b3d918700  1 -- 192.168.139.254:0/12649 --> 192.168.139.4:6789/0 -- auth(proto 0 30 bytes epoch 0) v1 -- ?+0 0x7f8b340075b0 con 0x7f8b34004a70
2013-05-13 12:37:51.252466 7f8b3c115700  0 -- 192.168.139.254:0/12649 >> 192.168.139.4:6789/0 pipe(0x7f8b34007280 sd=8 :0 s=1 pgs=0 cs=0 l=1).fault
2013-05-13 12:37:54.252385 7f8b3d918700  1 -- 192.168.139.254:0/12649 mark_down 0x7f8b34004a70 -- 0x7f8b34007280
2013-05-13 12:37:54.252479 7f8b3d918700  1 -- 192.168.139.254:0/12649 --> 192.168.139.3:6789/0 -- auth(proto 0 30 bytes epoch 0) v1 -- ?+0 0x7f8b34006000 con 0x7f8b34007bc0
2013-05-13 12:37:54.252713 7f8b2ffff700  0 -- 192.168.139.254:0/12649 >> 192.168.139.3:6789/0 pipe(0x7f8b34007960 sd=8 :0 s=1 pgs=0 cs=0 l=1).fault

What additional info cat I provide?

Stephen
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux