Re: Infernalis 9.2.1 MDS crash

John Spray <jspray@xxxxxxxxxx> · Wed, 9 Mar 2016 11:26:24 +0000

The MDS restarted because it received an MDSMap from the monitors in
which its own entry had been removed.

This is usually a sign that the MDS was failing to communicate with
the mons for some period of time, and as a result the mons have given
up and cause another MDS to take over.  However, in this instance we
can see the mds and mon exchanging beacons regularly.

The last acknowledged beacon from was at 2016-03-09 04:53:38.824983

The updated mdsmap came at  04:53:56.  18 seconds shouldn't have been
long enough for anything to time out, unless you've changed the
defaults.

I notice that the new MDSMap (epoch 573) also indicates that peer MDS
daemons have been failed, and that shortly before receiving the new
map, there are a bunch of log messages indicating various client
connections resetting.

So from this log I would guess some kind of network issue?

You say that the MDS crashed, why?  From the log it looks like it's
respawning itself, which shouldn't immediately be noticeable, you
should just see another MDS daemon take over, and a few seconds later
this guy would come back as a standby.

John

On Wed, Mar 9, 2016 at 9:55 AM, Florent B <florent@xxxxxxxxxxx> wrote:
> Hi everyone,
>
> Last night one of my MDS crashed.
>
> It was running last Infernalis packaged version for Jessie.
>
> Here is last minutes log : http://paste.ubuntu.com/15333772/
>
> Does anyone have an idea of what caused the crash ?
>
> Thank you.
>
> Florent
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com