Re: Ceph MDS continually respawning (hammer)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I've experienced MDS issues in the past, but nothing sticks out to me in your logs.

Are you using a single active MDS with failover, or multiple active MDS? 

--Lincoln

On May 22, 2015, at 10:10 AM, Adam Tygart wrote:

> Thanks for the quick response.
> 
> I had 'debug mds = 20' in the first log, I added 'debug ms = 1' for this one:
> https://drive.google.com/file/d/0B4XF1RWjuGh5bXFnRzE1SHF6blE/view?usp=sharing
> 
> Based on these logs, it looks like heartbeat_map is_healthy 'MDS' just
> times out and then the mds gets respawned.
> 
> --
> Adam
> 
> On Fri, May 22, 2015 at 9:42 AM, Lincoln Bryant <lincolnb@xxxxxxxxxxxx> wrote:
>> Hi Adam,
>> 
>> You can get the MDS to spit out more debug information like so:
>> 
>>        # ceph mds tell 0 injectargs '--debug-mds 20 --debug-ms 1'
>> 
>> At least then you can see where it's at when it crashes.
>> 
>> --Lincoln
>> 
>> On May 22, 2015, at 9:33 AM, Adam Tygart wrote:
>> 
>>> Hello all,
>>> 
>>> The ceph-mds servers in our cluster are performing a constant
>>> boot->replay->crash in our systems.
>>> 
>>> I have enable debug logging for the mds for a restart cycle on one of
>>> the nodes[1].
>>> 
>>> Kernel debug from cephfs client during reconnection attempts:
>>> [732586.352173] ceph:  mdsc delayed_work
>>> [732586.352178] ceph:  check_delayed_caps
>>> [732586.352182] ceph:  lookup_mds_session ffff88202f01c000 210
>>> [732586.352185] ceph:  mdsc get_session ffff88202f01c000 210 -> 211
>>> [732586.352189] ceph:  send_renew_caps ignoring mds0 (up:replay)
>>> [732586.352192] ceph:  add_cap_releases ffff88202f01c000 mds0 extra 680
>>> [732586.352195] ceph:  mdsc put_session ffff88202f01c000 211 -> 210
>>> [732586.352198] ceph:  mdsc delayed_work
>>> [732586.352200] ceph:  check_delayed_caps
>>> [732586.352202] ceph:  lookup_mds_session ffff881036cbf800 1
>>> [732586.352205] ceph:  mdsc get_session ffff881036cbf800 1 -> 2
>>> [732586.352207] ceph:  send_renew_caps ignoring mds0 (up:replay)
>>> [732586.352210] ceph:  add_cap_releases ffff881036cbf800 mds0 extra 680
>>> [732586.352212] ceph:  mdsc put_session ffff881036cbf800 2 -> 1
>>> [732591.357123] ceph:  mdsc delayed_work
>>> [732591.357128] ceph:  check_delayed_caps
>>> [732591.357132] ceph:  lookup_mds_session ffff88202f01c000 210
>>> [732591.357135] ceph:  mdsc get_session ffff88202f01c000 210 -> 211
>>> [732591.357139] ceph:  add_cap_releases ffff88202f01c000 mds0 extra 680
>>> [732591.357142] ceph:  mdsc put_session ffff88202f01c000 211 -> 210
>>> [732591.357145] ceph:  mdsc delayed_work
>>> [732591.357147] ceph:  check_delayed_caps
>>> [732591.357149] ceph:  lookup_mds_session ffff881036cbf800 1
>>> [732591.357152] ceph:  mdsc get_session ffff881036cbf800 1 -> 2
>>> [732591.357154] ceph:  add_cap_releases ffff881036cbf800 mds0 extra 680
>>> [732591.357157] ceph:  mdsc put_session ffff881036cbf800 2 -> 1
>>> [732596.362076] ceph:  mdsc delayed_work
>>> [732596.362081] ceph:  check_delayed_caps
>>> [732596.362084] ceph:  lookup_mds_session ffff88202f01c000 210
>>> [732596.362087] ceph:  mdsc get_session ffff88202f01c000 210 -> 211
>>> [732596.362091] ceph:  add_cap_releases ffff88202f01c000 mds0 extra 680
>>> [732596.362094] ceph:  mdsc put_session ffff88202f01c000 211 -> 210
>>> [732596.362097] ceph:  mdsc delayed_work
>>> [732596.362099] ceph:  check_delayed_caps
>>> [732596.362101] ceph:  lookup_mds_session ffff881036cbf800 1
>>> [732596.362104] ceph:  mdsc get_session ffff881036cbf800 1 -> 2
>>> [732596.362106] ceph:  add_cap_releases ffff881036cbf800 mds0 extra 680
>>> [732596.362109] ceph:  mdsc put_session ffff881036cbf800 2 -> 1
>>> 
>>> Anybody have any debugging tips, or have any ideas on how to get an mds stable?
>>> 
>>> Server info: CentOS 7.1 with Ceph 0.94.1
>>> Client info: Gentoo, kernel cephfs. 3.19.5-gentoo
>>> 
>>> I'd reboot the client, but at this point, I don't believe this is a
>>> client issue.
>>> 
>>> [1] https://drive.google.com/file/d/0B4XF1RWjuGh5WU1OZXpNb0Z1ck0/view?usp=sharing
>>> 
>>> --
>>> Adam
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@xxxxxxxxxxxxxx
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> 

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux