Re: mds: fix Resetter locking

"Yan, Zheng" <ukernel@xxxxxxxxx> · Thu, 19 Dec 2013 21:31:15 +0800

On Thu, Dec 19, 2013 at 3:21 PM, Alexandre Oliva <oliva@xxxxxxx> wrote:
> For some weird reason I couldn't figure out, after I simultaneously
> brought down all components of my ceph cluster and then brought them
> back up, the mds wouldn't come back, complaining about a zero-sized
> entry in its journal some 8+MB behind the end of the journal.  I hadn't
> ever got this problem, and it's not entirely unusual for me to restart
> all cluster components at once after some configuration change.
>

I suspect it was caused by bug http://tracker.ceph.com/issues/6458.
did you use active + standby mds step ? how quickly did you bring down
the osd and mds?

Regards
Yan, Zheng

> Anyway...  Long story short, after some poking at the mds journal to see
> if I could figure out how to get it back up, I gave up and decided to
> use the --reset-journal hammer.  Except that it just sat there, never
> completing or even getting noticed by the cluster.  After a bit of
> additional investigation, the following patch was born, and now my
> Emperor cluster is back up.  Phew! :-)
>
>
>
> --
> Alexandre Oliva, freedom fighter    http://FSFLA.org/~lxoliva/
> You must be the change you wish to see in the world. -- Gandhi
> Be Free! -- http://FSFLA.org/   FSF Latin America board member
> Free Software Evangelist      Red Hat Brazil Compiler Engineer
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html