Re: mds laggy or crashed

Gregory Farnum <greg@xxxxxxxxxxx> · Wed, 23 Oct 2013 17:32:54 -0700



Looks like your journal has some bad events in it, probably due to
bugs in the multi-MDS systems. Did you start out this cluster on 67.4,
or has it been upgraded at some point?
Why did you use two active MDS daemons?
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com


On Mon, Oct 21, 2013 at 7:05 PM, Gagandeep Arora <aroragagan24@xxxxxxxxx> wrote:
> Hello,
>
> We are running ceph-0.67.4 with two mds and both of the mds daemons are
> crashing see the logs below:
>
>
> [root@ceph1 ~]# ceph health detail
> HEALTH_ERR mds rank 1 has failed; mds cluster is degraded; mds a is laggy
> mds.1 has failed
> mds cluster is degraded
> mds.a at 192.168.6.101:6808/14609 rank 0 is replaying journal
> mds.a at 192.168.6.101:6808/14609 is laggy/unresponsive
>
>
> [root@ceph1 ~]# ceph mds dump
> dumped mdsmap epoch 19386
> epoch 19386
> flags 0
> created 2013-03-20 08:56:13.873024
> modified 2013-10-22 11:58:31.374700
> tableserver 0
> root 0
> session_timeout 60
> session_autoclose 300
> last_failure 19253
> last_failure_osd_epoch 6648
> compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable
> ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds
> uses versioned encoding}
> max_mds 2
> in 0,1
> up {0=222230}
> failed 1
> stopped
> data_pools 0,13,14
> metadata_pool 1
> 222230: 192.168.6.101:6808/14609 'a' mds.0.19 up:replay seq 1 laggy since
> 2013-10-22 11:55:50.972032
>
>
> [root@ceph1 ~]# ceph-mds -i a -d
> 2013-10-22 11:55:28.093342 7f343195f7c0  0 ceph version 0.67.4
> (ad85b8bfafea6232d64cb7ba76a8b6e8252fa0c7), process ceph-mds, pid 14609
> starting mds.a at :/0
> 2013-10-22 11:55:31.550871 7f342c593700  1 mds.-1.0 handle_mds_map standby
> 2013-10-22 11:55:32.151652 7f342c593700  1 mds.0.19 handle_mds_map i am now
> mds.0.19
> 2013-10-22 11:55:32.151658 7f342c593700  1 mds.0.19 handle_mds_map state
> change up:standby --> up:replay
> 2013-10-22 11:55:32.151661 7f342c593700  1 mds.0.19 replay_start
> 2013-10-22 11:55:32.151673 7f342c593700  1 mds.0.19  recovery set is 1
> 2013-10-22 11:55:32.151675 7f342c593700  1 mds.0.19  need osdmap epoch 6648,
> have 6647
> 2013-10-22 11:55:32.151677 7f342c593700  1 mds.0.19  waiting for osdmap 6648
> (which blacklists prior instance)
> 2013-10-22 11:55:32.275413 7f342c593700  0 mds.0.cache creating system inode
> with ino:100
> 2013-10-22 11:55:32.275720 7f342c593700  0 mds.0.cache creating system inode
> with ino:1
> mds/journal.cc: In function 'void EMetaBlob::replay(MDS*, LogSegment*,
> MDSlaveUpdate*)' thread 7f3428078700 time 2013-10-22 11:55:37.562600
> mds/journal.cc: 1096: FAILED assert(in->first == p->dnfirst ||
> (in->is_multiversion() && in->first > p->dnfirst))
>  ceph version 0.67.4 (ad85b8bfafea6232d64cb7ba76a8b6e8252fa0c7)
>  1: (EMetaBlob::replay(MDS*, LogSegment*, MDSlaveUpdate*)+0x399d) [0x65b0ad]
>  2: (EUpdate::replay(MDS*)+0x3a) [0x663c0a]
>  3: (MDLog::_replay_thread()+0x5cf) [0x82e17f]
>  4: (MDLog::ReplayThread::entry()+0xd) [0x6393ad]
>  5: (()+0x7d15) [0x7f3430fc2d15]
>  6: (clone()+0x6d) [0x7f342fa3948d]
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to
> interpret this.
> 2013-10-22 11:55:37.563382 7f3428078700 -1 mds/journal.cc: In function 'void
> EMetaBlob::replay(MDS*, LogSegment*, MDSlaveUpdate*)' thread 7f3428078700
> time 2013-10-22 11:55:37.562600
> mds/journal.cc: 1096: FAILED assert(in->first == p->dnfirst ||
> (in->is_multiversion() && in->first > p->dnfirst))
>
>  ceph version 0.67.4 (ad85b8bfafea6232d64cb7ba76a8b6e8252fa0c7)
>  1: (EMetaBlob::replay(MDS*, LogSegment*, MDSlaveUpdate*)+0x399d) [0x65b0ad]
>  2: (EUpdate::replay(MDS*)+0x3a) [0x663c0a]
>  3: (MDLog::_replay_thread()+0x5cf) [0x82e17f]
>  4: (MDLog::ReplayThread::entry()+0xd) [0x6393ad]
>  5: (()+0x7d15) [0x7f3430fc2d15]
>  6: (clone()+0x6d) [0x7f342fa3948d]
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to
> interpret this.
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com