Re: Infernalis 9.2.1 MDS crash

Florent B <florent@xxxxxxxxxxx> · Thu, 10 Mar 2016 11:09:42 +0100

Hi,

I can reproduce it, running MDS in foreground with "ceph-mds -i <id> -f
-d --setuser ceph --setgroup ceph" failed to respawn with this error :

global_init: error reading config file.

I found the problem, ceph.conf file was not readable by ceph user. This
is related to Proxmox way of handling config files. Will see with them.

Thank you

On 03/09/2016 02:11 PM, John Spray wrote:
> On Wed, Mar 9, 2016 at 11:37 AM, Florent B <florent@xxxxxxxxxxx> wrote:
>> Hi John and thank you for your explanations :)
>>
>> It could be a network issue.
>>
>> MDS should respawn, but "ceph-mds" process was no more running after
>> last log message, so I deduced it crashed...
> Hmm, that's worth investigating.  You can induce the MDS to respawn
> itself by simply doing "ceph mds fail <id>", or "ceph tell mds.<id>
> respawn"
>
> Can you play around and see if it's consistently failing to respawn,
> and if you can see any extra evidence, maybe try running the MDS in
> the foreground to make it easier to see any output ("ceph-mds -i <id>
> -f -d")
>
> John
>
>> On 03/09/2016 12:26 PM, John Spray wrote:
>>> The MDS restarted because it received an MDSMap from the monitors in
>>> which its own entry had been removed.
>>>
>>> This is usually a sign that the MDS was failing to communicate with
>>> the mons for some period of time, and as a result the mons have given
>>> up and cause another MDS to take over.  However, in this instance we
>>> can see the mds and mon exchanging beacons regularly.
>>>
>>> The last acknowledged beacon from was at 2016-03-09 04:53:38.824983
>>>
>>> The updated mdsmap came at  04:53:56.  18 seconds shouldn't have been
>>> long enough for anything to time out, unless you've changed the
>>> defaults.
>>>
>>> I notice that the new MDSMap (epoch 573) also indicates that peer MDS
>>> daemons have been failed, and that shortly before receiving the new
>>> map, there are a bunch of log messages indicating various client
>>> connections resetting.
>>>
>>> So from this log I would guess some kind of network issue?
>>>
>>> You say that the MDS crashed, why?  From the log it looks like it's
>>> respawning itself, which shouldn't immediately be noticeable, you
>>> should just see another MDS daemon take over, and a few seconds later
>>> this guy would come back as a standby.
>>>
>>> John
>>>
>>> On Wed, Mar 9, 2016 at 9:55 AM, Florent B <florent@xxxxxxxxxxx> wrote:
>>>> Hi everyone,
>>>>
>>>> Last night one of my MDS crashed.
>>>>
>>>> It was running last Infernalis packaged version for Jessie.
>>>>
>>>> Here is last minutes log : http://paste.ubuntu.com/15333772/
>>>>
>>>> Does anyone have an idea of what caused the crash ?
>>>>
>>>> Thank you.
>>>>
>>>> Florent
>>>> _______________________________________________
>>>> ceph-users mailing list
>>>> ceph-users@xxxxxxxxxxxxxx
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com