MDS crashed (ver 0.56.2)

x yasha <xyasha86@xxxxxxxxx> · Tue, 9 Apr 2013 14:55:10 +0800

I'm testing ceph for a while with a 4 node cluster(1 mon, 1 mds, and 2 osds), each installed ceph 0.56.2.

Today I ran into a mds crash case, on host mds process ceph-mds is terminated by assert().
My questions here are:
1. Reason of mds' crash.
2. How to solve it without mkcephfs.

It's reproducible in my environment.

Following is information may be related:
1. "ceph -s" output
2. ceph.conf
3. part of ceph-mds.a.log (the whole log file is at http://pastebin.com/NJd0UCfF)

1. "ceph -s" output
==============
   health HEALTH_WARN mds a is laggy
   monmap e1: 1 mons at {a=mon.mon.mon.mon:6789/0}, election epoch 1, quorum 0 a
   osdmap e220: 2 osds: 2 up, 2 in
    pgmap v3614: 576 pgs: 576 active+clean; 6618 KB data, 162 MB used, 4209 MB / 4606 MB avail
   mdsmap e860: 1/1/1 up {0=a=up:active(laggy or crashed)}

2. ceph.conf
=========
[global]
    auth supported = none
    auth cluster required = none
    auth service required = none
    auth client required = none
    debug mds = 20

[mon]
    mon data = "">[mon.a]
    host = mon 
    mon addr = xx.xx.xx.xx:6789

[mds]
[mds.a]
    host = mds 

[osd]
    osd data = "">    osd journal size = 128 
    filestore xattr use omap = true
[osd.0]
    host = osd0
[osd.1]
    host = osd1

3. part of ceph-mds.a.log
==================
2013-04-09 02:22:58.577485 7f587b640700  1 mds.0.35 handle_mds_map i am now mds.0.35
2013-04-09 02:22:58.577489 7f587b640700  1 mds.0.35 handle_mds_map state change up:rejoin --> up:active
2013-04-09 02:22:58.577494 7f587b640700  1 mds.0.35 recovery_done -- successful recovery!

2013-04-09 02:22:58.577507 7f587b640700  7 mds.0.tableserver(anchortable) finish_recovery
2013-04-09 02:22:58.577515 7f587b640700  7 mds.0.tableserver(snaptable) finish_recovery
2013-04-09 02:22:58.577521 7f587b640700  7 mds.0.tableclient(anchortable) finish_recovery
2013-04-09 02:22:58.577525 7f587b640700  7 mds.0.tableclient(snaptable) finish_recovery
2013-04-09 02:22:58.577529 7f587b640700 10 mds.0.cache start_recovered_truncates
2013-04-09 02:22:58.577533 7f587b640700 10 mds.0.cache do_file_recover 0 queued, 0 recovering
2013-04-09 02:22:58.577541 7f587b640700 10 mds.0.cache reissue_all_caps
2013-04-09 02:22:58.581855 7f587b640700 -1 mds/MDCache.cc: In function 'void MDCache::populate_mydir()' thread 7f587b640700 time 2013-04-09 02:22:58.577558
mds/MDCache.cc: 579: FAILED assert(mydir)

 ceph version 0.56.2 (586538e22afba85c59beda49789ec42024e7a061)
 1: (MDCache::populate_mydir()+0xbc5) [0x5f0125]
 2: (MDS::recovery_done()+0xde) [0x4ed12e]
 3: (MDS::handle_mds_map(MMDSMap*)+0x39c8) [0x4fff28]
 4: (MDS::handle_core_message(Message*)+0xb4b) [0x50596b]
 5: (MDS::_dispatch(Message*)+0x2f) [0x505a9f]
 6: (MDS::ms_dispatch(Message*)+0x23b) [0x50759b]
 7: (Messenger::ms_deliver_dispatch(Message*)+0x66) [0x872a26]
 8: (DispatchQueue::entry()+0x32a) [0x87093a]
 9: (DispatchQueue::DispatchThread::entry()+0xd) [0x7ee7cd]
 10: (()+0x6a3f) [0x7f587f465a3f]
 11: (clone()+0x6d) [0x7f587df1967d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- begin dump of recent events --- 

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Follow-Ups:

Re:  MDS crashed (ver 0.56.2)
From: Gregory Farnum

Prev by Date:
Re:  under performing osd, where to look ?

Next by Date:
Re:  Performance problems

Previous by thread:
Backup of cephfs metadata

Next by thread:
Re:  MDS crashed (ver 0.56.2)

Index(es):

Date
Thread

[Index of Archives]

[Information on CEPH]

[Linux Filesystem Development]

[Ceph Development]

[Ceph Large]

[Ceph Dev]

[Linux USB Development]

[Video for Linux]

[Linux Audio Users]

[Yosemite News]

[Linux Kernel]

[Linux SCSI]

[xfs]