Re: Infernalis 9.2.1 MDS crash

Florent B <florent@xxxxxxxxxxx> · Wed, 16 Mar 2016 11:03:14 +0100

I always have the same error, even with giving ceph user rights to read
ceph.conf file :

# groups ceph
ceph : ceph www-data

# ls -alh /etc/ceph/ceph.conf
lrwxrwxrwx 1 root root 18 May 27  2015 /etc/ceph/ceph.conf ->
/etc/pve/ceph.conf

# ls -alh /etc/pve/ceph.conf
-rw-r----- 1 root www-data 3.6K Mar  8 12:35 /etc/pve/ceph.conf

www-data group has right to read file, and ceph user is in www-data
group. Does it need write permission ? :o

The problem is that I can't change permissions on /etc/pve, it's a
special file system...

On 03/10/2016 11:09 AM, Florent B wrote:
> Hi,
>
> I can reproduce it, running MDS in foreground with "ceph-mds -i <id> -f
> -d --setuser ceph --setgroup ceph" failed to respawn with this error :
>
> global_init: error reading config file.
>
> I found the problem, ceph.conf file was not readable by ceph user. This
> is related to Proxmox way of handling config files. Will see with them.
>
> Thank you
>
> On 03/09/2016 02:11 PM, John Spray wrote:
>> On Wed, Mar 9, 2016 at 11:37 AM, Florent B <florent@xxxxxxxxxxx> wrote:
>>> Hi John and thank you for your explanations :)
>>>
>>> It could be a network issue.
>>>
>>> MDS should respawn, but "ceph-mds" process was no more running after
>>> last log message, so I deduced it crashed...
>> Hmm, that's worth investigating.  You can induce the MDS to respawn
>> itself by simply doing "ceph mds fail <id>", or "ceph tell mds.<id>
>> respawn"
>>
>> Can you play around and see if it's consistently failing to respawn,
>> and if you can see any extra evidence, maybe try running the MDS in
>> the foreground to make it easier to see any output ("ceph-mds -i <id>
>> -f -d")
>>
>> John
>>
>>> On 03/09/2016 12:26 PM, John Spray wrote:
>>>> The MDS restarted because it received an MDSMap from the monitors in
>>>> which its own entry had been removed.
>>>>
>>>> This is usually a sign that the MDS was failing to communicate with
>>>> the mons for some period of time, and as a result the mons have given
>>>> up and cause another MDS to take over.  However, in this instance we
>>>> can see the mds and mon exchanging beacons regularly.
>>>>
>>>> The last acknowledged beacon from was at 2016-03-09 04:53:38.824983
>>>>
>>>> The updated mdsmap came at  04:53:56.  18 seconds shouldn't have been
>>>> long enough for anything to time out, unless you've changed the
>>>> defaults.
>>>>
>>>> I notice that the new MDSMap (epoch 573) also indicates that peer MDS
>>>> daemons have been failed, and that shortly before receiving the new
>>>> map, there are a bunch of log messages indicating various client
>>>> connections resetting.
>>>>
>>>> So from this log I would guess some kind of network issue?
>>>>
>>>> You say that the MDS crashed, why?  From the log it looks like it's
>>>> respawning itself, which shouldn't immediately be noticeable, you
>>>> should just see another MDS daemon take over, and a few seconds later
>>>> this guy would come back as a standby.
>>>>
>>>> John
>>>>
>>>> On Wed, Mar 9, 2016 at 9:55 AM, Florent B <florent@xxxxxxxxxxx> wrote:
>>>>> Hi everyone,
>>>>>
>>>>> Last night one of my MDS crashed.
>>>>>
>>>>> It was running last Infernalis packaged version for Jessie.
>>>>>
>>>>> Here is last minutes log : http://paste.ubuntu.com/15333772/
>>>>>
>>>>> Does anyone have an idea of what caused the crash ?
>>>>>
>>>>> Thank you.
>>>>>
>>>>> Florent
>>>>> _______________________________________________
>>>>> ceph-users mailing list
>>>>> ceph-users@xxxxxxxxxxxxxx
>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com