Hi Adam, You can get the MDS to spit out more debug information like so: # ceph mds tell 0 injectargs '--debug-mds 20 --debug-ms 1' At least then you can see where it's at when it crashes. --Lincoln On May 22, 2015, at 9:33 AM, Adam Tygart wrote: > Hello all, > > The ceph-mds servers in our cluster are performing a constant > boot->replay->crash in our systems. > > I have enable debug logging for the mds for a restart cycle on one of > the nodes[1]. > > Kernel debug from cephfs client during reconnection attempts: > [732586.352173] ceph: mdsc delayed_work > [732586.352178] ceph: check_delayed_caps > [732586.352182] ceph: lookup_mds_session ffff88202f01c000 210 > [732586.352185] ceph: mdsc get_session ffff88202f01c000 210 -> 211 > [732586.352189] ceph: send_renew_caps ignoring mds0 (up:replay) > [732586.352192] ceph: add_cap_releases ffff88202f01c000 mds0 extra 680 > [732586.352195] ceph: mdsc put_session ffff88202f01c000 211 -> 210 > [732586.352198] ceph: mdsc delayed_work > [732586.352200] ceph: check_delayed_caps > [732586.352202] ceph: lookup_mds_session ffff881036cbf800 1 > [732586.352205] ceph: mdsc get_session ffff881036cbf800 1 -> 2 > [732586.352207] ceph: send_renew_caps ignoring mds0 (up:replay) > [732586.352210] ceph: add_cap_releases ffff881036cbf800 mds0 extra 680 > [732586.352212] ceph: mdsc put_session ffff881036cbf800 2 -> 1 > [732591.357123] ceph: mdsc delayed_work > [732591.357128] ceph: check_delayed_caps > [732591.357132] ceph: lookup_mds_session ffff88202f01c000 210 > [732591.357135] ceph: mdsc get_session ffff88202f01c000 210 -> 211 > [732591.357139] ceph: add_cap_releases ffff88202f01c000 mds0 extra 680 > [732591.357142] ceph: mdsc put_session ffff88202f01c000 211 -> 210 > [732591.357145] ceph: mdsc delayed_work > [732591.357147] ceph: check_delayed_caps > [732591.357149] ceph: lookup_mds_session ffff881036cbf800 1 > [732591.357152] ceph: mdsc get_session ffff881036cbf800 1 -> 2 > [732591.357154] ceph: add_cap_releases ffff881036cbf800 mds0 extra 680 > [732591.357157] ceph: mdsc put_session ffff881036cbf800 2 -> 1 > [732596.362076] ceph: mdsc delayed_work > [732596.362081] ceph: check_delayed_caps > [732596.362084] ceph: lookup_mds_session ffff88202f01c000 210 > [732596.362087] ceph: mdsc get_session ffff88202f01c000 210 -> 211 > [732596.362091] ceph: add_cap_releases ffff88202f01c000 mds0 extra 680 > [732596.362094] ceph: mdsc put_session ffff88202f01c000 211 -> 210 > [732596.362097] ceph: mdsc delayed_work > [732596.362099] ceph: check_delayed_caps > [732596.362101] ceph: lookup_mds_session ffff881036cbf800 1 > [732596.362104] ceph: mdsc get_session ffff881036cbf800 1 -> 2 > [732596.362106] ceph: add_cap_releases ffff881036cbf800 mds0 extra 680 > [732596.362109] ceph: mdsc put_session ffff881036cbf800 2 -> 1 > > Anybody have any debugging tips, or have any ideas on how to get an mds stable? > > Server info: CentOS 7.1 with Ceph 0.94.1 > Client info: Gentoo, kernel cephfs. 3.19.5-gentoo > > I'd reboot the client, but at this point, I don't believe this is a > client issue. > > [1] https://drive.google.com/file/d/0B4XF1RWjuGh5WU1OZXpNb0Z1ck0/view?usp=sharing > > -- > Adam > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com