On Wed, Mar 9, 2016 at 11:37 AM, Florent B <florent@xxxxxxxxxxx> wrote: > Hi John and thank you for your explanations :) > > It could be a network issue. > > MDS should respawn, but "ceph-mds" process was no more running after > last log message, so I deduced it crashed... Hmm, that's worth investigating. You can induce the MDS to respawn itself by simply doing "ceph mds fail <id>", or "ceph tell mds.<id> respawn" Can you play around and see if it's consistently failing to respawn, and if you can see any extra evidence, maybe try running the MDS in the foreground to make it easier to see any output ("ceph-mds -i <id> -f -d") John > > On 03/09/2016 12:26 PM, John Spray wrote: >> The MDS restarted because it received an MDSMap from the monitors in >> which its own entry had been removed. >> >> This is usually a sign that the MDS was failing to communicate with >> the mons for some period of time, and as a result the mons have given >> up and cause another MDS to take over. However, in this instance we >> can see the mds and mon exchanging beacons regularly. >> >> The last acknowledged beacon from was at 2016-03-09 04:53:38.824983 >> >> The updated mdsmap came at 04:53:56. 18 seconds shouldn't have been >> long enough for anything to time out, unless you've changed the >> defaults. >> >> I notice that the new MDSMap (epoch 573) also indicates that peer MDS >> daemons have been failed, and that shortly before receiving the new >> map, there are a bunch of log messages indicating various client >> connections resetting. >> >> So from this log I would guess some kind of network issue? >> >> You say that the MDS crashed, why? From the log it looks like it's >> respawning itself, which shouldn't immediately be noticeable, you >> should just see another MDS daemon take over, and a few seconds later >> this guy would come back as a standby. >> >> John >> >> On Wed, Mar 9, 2016 at 9:55 AM, Florent B <florent@xxxxxxxxxxx> wrote: >>> Hi everyone, >>> >>> Last night one of my MDS crashed. >>> >>> It was running last Infernalis packaged version for Jessie. >>> >>> Here is last minutes log : http://paste.ubuntu.com/15333772/ >>> >>> Does anyone have an idea of what caused the crash ? >>> >>> Thank you. >>> >>> Florent >>> _______________________________________________ >>> ceph-users mailing list >>> ceph-users@xxxxxxxxxxxxxx >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com