Re: MDS crash (Mimic 13.2.2 / 13.2.4 ) elist.h: 39: FAILED assert(!is_on_list())

"Yan, Zheng" <ukernel@xxxxxxxxx> · Mon, 11 Feb 2019 16:17:30 +0800

On Sat, Feb 9, 2019 at 12:36 AM Jake Grimmett <jog@xxxxxxxxxxxxxxxxx> wrote:
>
> Dear All,
>
> Unfortunately the MDS has crashed on our Mimic cluster...
>
> First symptoms were rsync giving:
> "No space left on device (28)"
> when trying to rename or delete
>
> This prompted me to try restarting the MDS, as it reported laggy.
>
> Restarting the MDS, shows this as error in the log before the crash:
>
> elist.h: 39: FAILED assert(!is_on_list())
>
> A full MDS log showing the crash is here:
>
> http://p.ip.fi/iWlz
>
> I've tried upgrading the cluster to 13.2.4, but the MDS still crashes...
>
> The cluster has 10 nodes, 254 OSD's, uses EC for the data, 3x
> replication for MDS. We have a single active MDS, with two failover MDS
>
> We have ~2PB of cephfs data here, all of which is currently
> inaccessible, all and any advice gratefully received :)
>

Add mds_cache_size and mds_cache_memory_limit to ceph.conf and set
them to very large values before starting mds. If mds does not crash,
restore the mds_cache_size and mds_cache_memory_limit  to their
original values (by admin socket) after mds becomes active for 10
seconds

If mds still crash, try compile ceph-mds with following patch

diff --git a/src/mds/CDir.cc b/src/mds/CDir.cc
index d3461fba2e..c2731e824c 100644
--- a/src/mds/CDir.cc
+++ b/src/mds/CDir.cc
@@ -508,6 +508,8 @@ void CDir::remove_dentry(CDentry *dn)
   // clean?
   if (dn->is_dirty())
     dn->mark_clean();
+  if (inode->is_stray())
+    dn->item_stray.remove_myself();

   if (dn->state_test(CDentry::STATE_BOTTOMLRU))
     cache->bottom_lru.lru_remove(dn);


> best regards,
>
> Jake
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com