Hi Henry, On Fri, 1 Oct 2010, Henry C Chang wrote: > Hi, > > Today I found that all my ceph clients are not able to 'ls' the ceph > mount point. > Since all ceph daemons are alive, I decided to restart the MDS to see > if it helps. > Unfortunately, MDS is never coming back again. It got an error during > replay every time. > > The last few lines of MDS log shows: Can you post the full MDS log somewhere and let me know in the bug where I can find it? This is either a problem with journal replay properly handling partial writes at the end of the journal, or a corruption. http://tracker.newdream.net/issues/451 Thanks! sage > > 2010-10-01 03:30:59.816017 7f991dcf7710 mds0.journaler try_read_entry > at 822192580 reading 822192580~4 (have 160437) > 2010-10-01 03:30:59.816022 7f991dcf7710 mds0.journaler try_read_entry > got 0 len entry at offset 822192580 > 2010-10-01 03:30:59.816026 7f991dcf7710 mds0.log _replay journaler got > error -22, aborting > 2010-10-01 03:30:59.816030 7f991dcf7710 mds0.log _replay_thread kicking waiters > 2010-10-01 03:30:59.816034 7f991dcf7710 mds0.18 boot_start encountered > an error, failing > 2010-10-01 03:30:59.816038 7f991dcf7710 mds0.18 suicide. wanted > up:replay, now down:dne > 2010-10-01 03:30:59.816047 7f991dcf7710 mds0.cache WARNING: mdcache > shutdown with non-empty cache > 2010-10-01 03:30:59.816052 7f991dcf7710 mds0.cache show_subtrees > 2010-10-01 03:30:59.816059 7f991dcf7710 mds0.cache |__ 0 auth [dir > 1 / [2,head] auth v=327474 cv=0/0 dir_auth=0 state=1610612736 f(v5 > m2010-09-30 10:42:48.462366 5=0+5) n(v5922 rc2010-10-01 > 02:32:18.156019 b129532745664 784=23+761)/n(v5922 rc2010-10-01 > 02:32:18.156019 b129521095616 784=23+761) hs=1+0,ss=0+0 dirty=1 | > child subtree dirty 0x7f98e8006ae0] > 2010-10-01 03:30:59.816078 7f991dcf7710 mds0.cache |__ 0 auth [dir > 100 ~mds0/ [2,head] auth v=795 cv=0/0 dir_auth=0 state=1610612736 f(v0 > 2=1+1) n(v54 rc2010-10-01 02:07:56.696257 b1883380490265 372=365+7) > hs=1+0,ss=0+0 dirty=1 | child subtree dirty 0x7f98e80070f0] > 2010-10-01 03:30:59.816094 7f991dcf7710 mds0.log _replay_thread finish > 2010-10-01 03:30:59.816129 7f992d903710 mds0.18 handle_mds_beacon > up:replay seq 3 rtt 7.483578 > 2010-10-01 03:30:59.816571 7f9930405720 7f9930405720 stopped. > > I am wondering if there is a way to recover back from this situation. > Thanks, > > Henry > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html