On May 10, 2013, at 3:39 PM, Joao Eduardo Luis <joao.luis@xxxxxxxxxxx> wrote: > We would certainly be interested in taking a look at logs from those monitors, and would appreciate if you could set 'debug mon = 20', 'debug auth = 10' and 'debug ms = 1', and give them a spin until you hit your issue. > I seeing the same problem at Jeppesen. I running 0.61.1 with 3 MON, 4 OSD and 1 MDS and a reboot of the cluster falls in the same state with hung ceph-create-keys and the monitors not running. I add the debug setting as indicated. This is a excerpt from of the output of "ceph status "2013-05-13 12:37:21.249265 7f8b428a6780 1 -- :/0 messenger.start 2013-05-13 12:37:21.249500 7f8b428a6780 5 adding auth protocol: cephx 2013-05-13 12:37:21.249807 7f8b428a6780 2 auth: KeyRing::load: loaded key file /etc/ceph/ceph.client.admin.keyring 2013-05-13 12:37:21.250031 7f8b428a6780 1 -- :/12649 --> 192.168.139.4:6789/0 -- auth(proto 0 30 bytes epoch 0) v1 -- ?+0 0x2ae5b60 con 0x2ae57c0 2013-05-13 12:37:21.250219 7f8b428a4700 0 -- :/12649 >> 192.168.139.4:6789/0 pipe(0x2ae5560 sd=7 :0 s=1 pgs=0 cs=0 l=1).fault 2013-05-13 12:37:24.249964 7f8b3d918700 1 -- :/12649 mark_down 0x2ae57c0 -- 0x2ae5560 2013-05-13 12:37:24.250150 7f8b3d918700 1 -- :/12649 --> 192.168.139.3:6789/0 -- auth(proto 0 30 bytes epoch 0) v1 -- ?+0 0x7f8b34001350 con 0x7f8b34000e60 2013-05-13 12:37:24.250409 7f8b3c115700 0 -- :/12649 >> 192.168.139.3:6789/0 pipe(0x7f8b34000c00 sd=8 :0 s=1 pgs=0 cs=0 l=1).fault 2013-05-13 12:37:27.250277 7f8b3d918700 1 -- :/12649 mark_down 0x7f8b34000e60 -- 0x7f8b34000c00 2013-05-13 12:37:27.250374 7f8b3d918700 1 -- :/12649 --> 192.168.139.4:6789/0 -- auth(proto 0 30 bytes epoch 0) v1 -- ?+0 0x7f8b34003440 con 0x7f8b34003270 2013-05-13 12:37:27.250607 7f8b428a4700 0 -- :/12649 >> 192.168.139.4:6789/0 pipe(0x7f8b34003010 sd=8 :0 s=1 pgs=0 cs=0 l=1).fault 2013-05-13 12:37:30.250523 7f8b3d918700 1 -- :/12649 mark_down 0x7f8b34003270 -- 0x7f8b34003010 2013-05-13 12:37:30.250619 7f8b3d918700 1 -- :/12649 --> 192.168.139.2:6789/0 -- auth(proto 0 30 bytes epoch 0) v1 -- ?+0 0x7f8b34003dc0 con 0x7f8b34003b20 2013-05-13 12:37:30.251151 7f8b3c115700 1 -- 192.168.139.254:0/12649 learned my addr 192.168.139.254:0/12649 2013-05-13 12:37:33.250733 7f8b3d918700 1 -- 192.168.139.254:0/12649 mark_down 0x7f8b34003b20 -- 0x7f8b340038c0 2013-05-13 12:37:33.250885 7f8b3d918700 1 -- 192.168.139.254:0/12649 --> 192.168.139.3:6789/0 -- auth(proto 0 30 bytes epoch 0) v1 -- ?+0 0x7f8b34002920 con 0x7f8b340025c0 2013-05-13 12:37:33.251081 7f8b2ffff700 0 -- 192.168.139.254:0/12649 >> 192.168.139.3:6789/0 pipe(0x7f8b34002360 sd=7 :0 s=1 pgs=0 cs=0 l=1).fault 2013-05-13 12:37:36.251046 7f8b3d918700 1 -- 192.168.139.254:0/12649 mark_down 0x7f8b340025c0 -- 0x7f8b34002360 2013-05-13 12:37:36.251133 7f8b3d918700 1 -- 192.168.139.254:0/12649 --> 192.168.139.4:6789/0 -- auth(proto 0 30 bytes epoch 0) v1 -- ?+0 0x7f8b34005010 con 0x7f8b340030d0 2013-05-13 12:37:36.251376 7f8b428a4700 0 -- 192.168.139.254:0/12649 >> 192.168.139.4:6789/0 pipe(0x7f8b34002e70 sd=8 :0 s=1 pgs=0 cs=0 l=1).fault 2013-05-13 12:37:39.251250 7f8b3d918700 1 -- 192.168.139.254:0/12649 mark_down 0x7f8b340030d0 -- 0x7f8b34002e70 2013-05-13 12:37:39.251347 7f8b3d918700 1 -- 192.168.139.254:0/12649 --> 192.168.139.2:6789/0 -- auth(proto 0 30 bytes epoch 0) v1 -- ?+0 0x7f8b34005720 con 0x7f8b34005480 2013-05-13 12:37:42.251493 7f8b3d918700 1 -- 192.168.139.254:0/12649 mark_down 0x7f8b34005480 -- 0x7f8b34005220 2013-05-13 12:37:42.251614 7f8b3d918700 1 -- 192.168.139.254:0/12649 --> 192.168.139.4:6789/0 -- auth(proto 0 30 bytes epoch 0) v1 -- ?+0 0x7f8b340047c0 con 0x7f8b34004520 2013-05-13 12:37:42.251800 7f8b3c115700 0 -- 192.168.139.254:0/12649 >> 192.168.139.4:6789/0 pipe(0x7f8b340042c0 sd=7 :0 s=1 pgs=0 cs=0 l=1).fault 2013-05-13 12:37:45.251683 7f8b3d918700 1 -- 192.168.139.254:0/12649 mark_down 0x7f8b34004520 -- 0x7f8b340042c0 2013-05-13 12:37:45.251777 7f8b3d918700 1 -- 192.168.139.254:0/12649 --> 192.168.139.2:6789/0 -- auth(proto 0 30 bytes epoch 0) v1 -- ?+0 0x7f8b34004c40 con 0x7f8b340049d0 2013-05-13 12:37:48.251928 7f8b3d918700 1 -- 192.168.139.254:0/12649 mark_down 0x7f8b340049d0 -- 0x7f8b34005d30 2013-05-13 12:37:48.252058 7f8b3d918700 1 -- 192.168.139.254:0/12649 --> 192.168.139.3:6789/0 -- auth(proto 0 30 bytes epoch 0) v1 -- ?+0 0x7f8b340052e0 con 0x7f8b34005040 2013-05-13 12:37:48.252252 7f8b2ffff700 0 -- 192.168.139.254:0/12649 >> 192.168.139.3:6789/0 pipe(0x7f8b34004de0 sd=7 :0 s=1 pgs=0 cs=0 l=1).fault 2013-05-13 12:37:51.252149 7f8b3d918700 1 -- 192.168.139.254:0/12649 mark_down 0x7f8b34005040 -- 0x7f8b34004de0 2013-05-13 12:37:51.252236 7f8b3d918700 1 -- 192.168.139.254:0/12649 --> 192.168.139.4:6789/0 -- auth(proto 0 30 bytes epoch 0) v1 -- ?+0 0x7f8b340075b0 con 0x7f8b34004a70 2013-05-13 12:37:51.252466 7f8b3c115700 0 -- 192.168.139.254:0/12649 >> 192.168.139.4:6789/0 pipe(0x7f8b34007280 sd=8 :0 s=1 pgs=0 cs=0 l=1).fault 2013-05-13 12:37:54.252385 7f8b3d918700 1 -- 192.168.139.254:0/12649 mark_down 0x7f8b34004a70 -- 0x7f8b34007280 2013-05-13 12:37:54.252479 7f8b3d918700 1 -- 192.168.139.254:0/12649 --> 192.168.139.3:6789/0 -- auth(proto 0 30 bytes epoch 0) v1 -- ?+0 0x7f8b34006000 con 0x7f8b34007bc0 2013-05-13 12:37:54.252713 7f8b2ffff700 0 -- 192.168.139.254:0/12649 >> 192.168.139.3:6789/0 pipe(0x7f8b34007960 sd=8 :0 s=1 pgs=0 cs=0 l=1).fault What additional info cat I provide? Stephen _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com