Re: mds cluster degraded

JIten Shah <jshah2005@xxxxxx> · Fri, 21 Nov 2014 16:25:42 -0800

This got taken care of after I deleted the pools for metadata and data and started it again. 

I did:

1. sudo service ceph stop mds

2. ceph mds newfs 1 0 —yes-i-really-mean-it (where 1 and 0 are pool ID’s for metadata and data)

3. ceph health (It was healthy now!!!)

4. sudo servie ceph start mds.$(hostname -s)

And I am back in business.

On Nov 18, 2014, at 3:27 PM, Gregory Farnum <greg@xxxxxxxxxxx> wrote:

> Hmm, last time we saw this it meant that the MDS log had gotten
> corrupted somehow and was a little short (in that case due to the OSDs
> filling up). What do you mean by "rebuilt the OSDs"?
> -Greg
> 
> On Mon, Nov 17, 2014 at 12:52 PM, JIten Shah <jshah2005@xxxxxx> wrote:
>> After i rebuilt the OSD’s, the MDS went into the degraded mode and will not
>> recover.
>> 
>> 
>> [jshah@Lab-cephmon001 ~]$ sudo tail -100f
>> /var/log/ceph/ceph-mds.Lab-cephmon001.log
>> 2014-11-17 17:55:27.855861 7fffef5d3700  0 -- X.X.16.111:6800/3046050 >>
>> X.X.16.114:0/838757053 pipe(0x1e18000 sd=22 :6800 s=0 pgs=0 cs=0 l=0
>> c=0x1e02c00).accept peer addr is really X.X.16.114:0/838757053 (socket is
>> X.X.16.114:34672/0)
>> 2014-11-17 17:57:27.855519 7fffef5d3700  0 -- X.X.16.111:6800/3046050 >>
>> X.X.16.114:0/838757053 pipe(0x1e18000 sd=22 :6800 s=2 pgs=2 cs=1 l=0
>> c=0x1e02c00).fault with nothing to send, going to standby
>> 2014-11-17 17:58:47.883799 7fffef3d1700  0 -- X.X.16.111:6800/3046050 >>
>> X.X.16.114:0/26738200 pipe(0x1e1be80 sd=23 :6800 s=0 pgs=0 cs=0 l=0
>> c=0x1e04ba0).accept peer addr is really X.X.16.114:0/26738200 (socket is
>> X.X.16.114:34699/0)
>> 2014-11-17 18:00:47.882484 7fffef3d1700  0 -- X.X.16.111:6800/3046050 >>
>> X.X.16.114:0/26738200 pipe(0x1e1be80 sd=23 :6800 s=2 pgs=2 cs=1 l=0
>> c=0x1e04ba0).fault with nothing to send, going to standby
>> 2014-11-17 18:01:47.886662 7fffef1cf700  0 -- X.X.16.111:6800/3046050 >>
>> X.X.16.114:0/3673954317 pipe(0x1e1c380 sd=24 :6800 s=0 pgs=0 cs=0 l=0
>> c=0x1e05540).accept peer addr is really X.X.16.114:0/3673954317 (socket is
>> X.X.16.114:34718/0)
>> 2014-11-17 18:03:47.885488 7fffef1cf700  0 -- X.X.16.111:6800/3046050 >>
>> X.X.16.114:0/3673954317 pipe(0x1e1c380 sd=24 :6800 s=2 pgs=2 cs=1 l=0
>> c=0x1e05540).fault with nothing to send, going to standby
>> 2014-11-17 18:04:47.888983 7fffeefcd700  0 -- X.X.16.111:6800/3046050 >>
>> X.X.16.114:0/3403131574 pipe(0x1e18a00 sd=25 :6800 s=0 pgs=0 cs=0 l=0
>> c=0x1e05280).accept peer addr is really X.X.16.114:0/3403131574 (socket is
>> X.X.16.114:34744/0)
>> 2014-11-17 18:06:47.888427 7fffeefcd700  0 -- X.X.16.111:6800/3046050 >>
>> X.X.16.114:0/3403131574 pipe(0x1e18a00 sd=25 :6800 s=2 pgs=2 cs=1 l=0
>> c=0x1e05280).fault with nothing to send, going to standby
>> 2014-11-17 20:02:03.558250 7ffff07de700 -1 mds.0.1 *** got signal Terminated
>> ***
>> 2014-11-17 20:02:03.558297 7ffff07de700  1 mds.0.1 suicide.  wanted
>> down:dne, now up:active
>> 2014-11-17 20:02:56.053339 7ffff7fe77a0  0 ceph version 0.80.5
>> (38b73c67d375a2552d8ed67843c8a65c2c0feba6), process ceph-mds, pid 3424727
>> 2014-11-17 20:02:56.121367 7ffff30e4700  1 mds.-1.0 handle_mds_map standby
>> 2014-11-17 20:02:56.124343 7ffff30e4700  1 mds.0.2 handle_mds_map i am now
>> mds.0.2
>> 2014-11-17 20:02:56.124345 7ffff30e4700  1 mds.0.2 handle_mds_map state
>> change up:standby --> up:replay
>> 2014-11-17 20:02:56.124348 7ffff30e4700  1 mds.0.2 replay_start
>> 2014-11-17 20:02:56.124359 7ffff30e4700  1 mds.0.2  recovery set is
>> 2014-11-17 20:02:56.124362 7ffff30e4700  1 mds.0.2  need osdmap epoch 93,
>> have 92
>> 2014-11-17 20:02:56.124363 7ffff30e4700  1 mds.0.2  waiting for osdmap 93
>> (which blacklists prior instance)
>> 
>> 
>> 
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> 

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com