Re: MDS crashed (ver 0.56.2)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Actually I didn't put any data into my Ceph cluster. 

I was just trying to understand Ceph's principles but reading code and running a test cluster. A lot of operations were done and I can't remember them. So I just ignored this error message and did mkcephfs.
But I still remember that I was focused on Ceph roles' interaction, so start/stop of specific daemons(ceph-mon, ceph-mds, ceph-osd) were exeucted.

Sorry for cannot provide more information :)


On Thu, Apr 11, 2013 at 11:17 PM, Gregory Farnum <greg@xxxxxxxxxxx> wrote:
That's certainly not great. Have you lost any data or removed anything
from the cluster? It looks like perhaps your MDS log lost an object,
and maybe got one shortened as well.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com


On Mon, Apr 8, 2013 at 11:55 PM, x yasha <xyasha86@xxxxxxxxx> wrote:
> I'm testing ceph for a while with a 4 node cluster(1 mon, 1 mds, and 2
> osds), each installed ceph 0.56.2.
>
> Today I ran into a mds crash case, on host mds process ceph-mds is
> terminated by assert().
> My questions here are:
> 1. Reason of mds' crash.
> 2. How to solve it without mkcephfs.
>
> It's reproducible in my environment.
> Following is information may be related:
> 1. "ceph -s" output
> 2. ceph.conf
> 3. part of ceph-mds.a.log (the whole log file is at
> http://pastebin.com/NJd0UCfF)
>
> 1. "ceph -s" output
> ==============
>    health HEALTH_WARN mds a is laggy
>    monmap e1: 1 mons at {a=mon.mon.mon.mon:6789/0}, election epoch 1, quorum
> 0 a
>    osdmap e220: 2 osds: 2 up, 2 in
>     pgmap v3614: 576 pgs: 576 active+clean; 6618 KB data, 162 MB used, 4209
> MB / 4606 MB avail
>    mdsmap e860: 1/1/1 up {0=a=up:active(laggy or crashed)}
>
> 2. ceph.conf
> =========
> [global]
>     auth supported = none
>     auth cluster required = none
>     auth service required = none
>     auth client required = none
>     debug mds = 20
>
> [mon]
>     mon data = ""> > [mon.a]
>     host = mon
>     mon addr = xx.xx.xx.xx:6789
>
> [mds]
> [mds.a]
>     host = mds
>
> [osd]
>     osd data = ""> >     osd journal size = 128
>     filestore xattr use omap = true
> [osd.0]
>     host = osd0
> [osd.1]
>     host = osd1
>
> 3. part of ceph-mds.a.log
> ==================
> 2013-04-09 02:22:58.577485 7f587b640700  1 mds.0.35 handle_mds_map i am now
> mds.0.35
> 2013-04-09 02:22:58.577489 7f587b640700  1 mds.0.35 handle_mds_map state
> change up:rejoin --> up:active
> 2013-04-09 02:22:58.577494 7f587b640700  1 mds.0.35 recovery_done --
> successful recovery!
> 2013-04-09 02:22:58.577507 7f587b640700  7 mds.0.tableserver(anchortable)
> finish_recovery
> 2013-04-09 02:22:58.577515 7f587b640700  7 mds.0.tableserver(snaptable)
> finish_recovery
> 2013-04-09 02:22:58.577521 7f587b640700  7 mds.0.tableclient(anchortable)
> finish_recovery
> 2013-04-09 02:22:58.577525 7f587b640700  7 mds.0.tableclient(snaptable)
> finish_recovery
> 2013-04-09 02:22:58.577529 7f587b640700 10 mds.0.cache
> start_recovered_truncates
> 2013-04-09 02:22:58.577533 7f587b640700 10 mds.0.cache do_file_recover 0
> queued, 0 recovering
> 2013-04-09 02:22:58.577541 7f587b640700 10 mds.0.cache reissue_all_caps
> 2013-04-09 02:22:58.581855 7f587b640700 -1 mds/MDCache.cc: In function 'void
> MDCache::populate_mydir()' thread 7f587b640700 time 2013-04-09
> 02:22:58.577558
> mds/MDCache.cc: 579: FAILED assert(mydir)
>
>  ceph version 0.56.2 (586538e22afba85c59beda49789ec42024e7a061)
>  1: (MDCache::populate_mydir()+0xbc5) [0x5f0125]
>  2: (MDS::recovery_done()+0xde) [0x4ed12e]
>  3: (MDS::handle_mds_map(MMDSMap*)+0x39c8) [0x4fff28]
>  4: (MDS::handle_core_message(Message*)+0xb4b) [0x50596b]
>  5: (MDS::_dispatch(Message*)+0x2f) [0x505a9f]
>  6: (MDS::ms_dispatch(Message*)+0x23b) [0x50759b]
>  7: (Messenger::ms_deliver_dispatch(Message*)+0x66) [0x872a26]
>  8: (DispatchQueue::entry()+0x32a) [0x87093a]
>  9: (DispatchQueue::DispatchThread::entry()+0xd) [0x7ee7cd]
>  10: (()+0x6a3f) [0x7f587f465a3f]
>  11: (clone()+0x6d) [0x7f587df1967d]
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to
> interpret this.
>
> --- begin dump of recent events ---
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux