I'm testing ceph for a while with a 4 node cluster(1 mon, 1 mds, and 2 osds), each installed ceph 0.56.2.
Today I ran into a mds crash case, on host mds process ceph-mds is terminated by assert().
My questions here are:
1. Reason of mds' crash.
2. How to solve it without mkcephfs.
It's reproducible in my environment.
Following is information may be related:
1. "ceph -s" output
2. ceph.conf
3. part of ceph-mds.a.log (the whole log file is at http://pastebin.com/NJd0UCfF)
1. "ceph -s" output
==============
health HEALTH_WARN mds a is laggy
monmap e1: 1 mons at {a=mon.mon.mon.mon:6789/0}, election epoch 1, quorum 0 a
osdmap e220: 2 osds: 2 up, 2 in
pgmap v3614: 576 pgs: 576 active+clean; 6618 KB data, 162 MB used, 4209 MB / 4606 MB avail
mdsmap e860: 1/1/1 up {0=a=up:active(laggy or crashed)}
2. ceph.conf
=========
[global]
auth supported = none
auth cluster required = none
auth service required = none
auth client required = none
debug mds = 20
[mon]
mon data = "">
[mon.a]
host = mon
mon addr = xx.xx.xx.xx:6789
[mds]
[mds.a]
host = mds
[osd]
osd data = "">
osd journal size = 128
filestore xattr use omap = true
[osd.0]
host = osd0
[osd.1]
host = osd1
3. part of ceph-mds.a.log
==================
2013-04-09 02:22:58.577485 7f587b640700 1 mds.0.35 handle_mds_map i am now mds.0.35
2013-04-09 02:22:58.577489 7f587b640700 1 mds.0.35 handle_mds_map state change up:rejoin --> up:active
2013-04-09 02:22:58.577494 7f587b640700 1 mds.0.35 recovery_done -- successful recovery!
2013-04-09 02:22:58.577507 7f587b640700 7 mds.0.tableserver(anchortable) finish_recovery
2013-04-09 02:22:58.577515 7f587b640700 7 mds.0.tableserver(snaptable) finish_recovery
2013-04-09 02:22:58.577521 7f587b640700 7 mds.0.tableclient(anchortable) finish_recovery
2013-04-09 02:22:58.577525 7f587b640700 7 mds.0.tableclient(snaptable) finish_recovery
2013-04-09 02:22:58.577529 7f587b640700 10 mds.0.cache start_recovered_truncates
2013-04-09 02:22:58.577533 7f587b640700 10 mds.0.cache do_file_recover 0 queued, 0 recovering
2013-04-09 02:22:58.577541 7f587b640700 10 mds.0.cache reissue_all_caps
2013-04-09 02:22:58.581855 7f587b640700 -1 mds/MDCache.cc: In function 'void MDCache::populate_mydir()' thread 7f587b640700 time 2013-04-09 02:22:58.577558
mds/MDCache.cc: 579: FAILED assert(mydir)
ceph version 0.56.2 (586538e22afba85c59beda49789ec42024e7a061)
1: (MDCache::populate_mydir()+0xbc5) [0x5f0125]
2: (MDS::recovery_done()+0xde) [0x4ed12e]
3: (MDS::handle_mds_map(MMDSMap*)+0x39c8) [0x4fff28]
4: (MDS::handle_core_message(Message*)+0xb4b) [0x50596b]
5: (MDS::_dispatch(Message*)+0x2f) [0x505a9f]
6: (MDS::ms_dispatch(Message*)+0x23b) [0x50759b]
7: (Messenger::ms_deliver_dispatch(Message*)+0x66) [0x872a26]
8: (DispatchQueue::entry()+0x32a) [0x87093a]
9: (DispatchQueue::DispatchThread::entry()+0xd) [0x7ee7cd]
10: (()+0x6a3f) [0x7f587f465a3f]
11: (clone()+0x6d) [0x7f587df1967d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
--- begin dump of recent events ---
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
- Follow-Ups:
- Re: MDS crashed (ver 0.56.2)
- From: Gregory Farnum
- Re: MDS crashed (ver 0.56.2)
- Prev by Date: Re: under performing osd, where to look ?
- Next by Date: Re: Performance problems
- Previous by thread: Backup of cephfs metadata
- Next by thread: Re: MDS crashed (ver 0.56.2)
- Index(es):