Re: MDS crash (Mimic 13.2.2 / 13.2.4 ) elist.h: 39: FAILED assert(!is_on_list())

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Feb 11, 2019 at 8:01 PM Jake Grimmett <jog@xxxxxxxxxxxxxxxxx> wrote:
>
> Hi Zheng,
>
> Many, many thanks for your help...
>
> Your suggestion of setting large values for mds_cache_size and
> mds_cache_memory_limit stopped our MDS crashing :)
>
> The values in ceph.conf are now:
>
> mds_cache_size = 8589934592
> mds_cache_memory_limit = 17179869184
>
> Should these values be left in our configuration?

No. you'd better to change them to original values.

>
> again thanks for the assistance,
>
> Jake
>
> On 2/11/19 8:17 AM, Yan, Zheng wrote:
> > On Sat, Feb 9, 2019 at 12:36 AM Jake Grimmett <jog@xxxxxxxxxxxxxxxxx> wrote:
> >>
> >> Dear All,
> >>
> >> Unfortunately the MDS has crashed on our Mimic cluster...
> >>
> >> First symptoms were rsync giving:
> >> "No space left on device (28)"
> >> when trying to rename or delete
> >>
> >> This prompted me to try restarting the MDS, as it reported laggy.
> >>
> >> Restarting the MDS, shows this as error in the log before the crash:
> >>
> >> elist.h: 39: FAILED assert(!is_on_list())
> >>
> >> A full MDS log showing the crash is here:
> >>
> >> http://p.ip.fi/iWlz
> >>
> >> I've tried upgrading the cluster to 13.2.4, but the MDS still crashes...
> >>
> >> The cluster has 10 nodes, 254 OSD's, uses EC for the data, 3x
> >> replication for MDS. We have a single active MDS, with two failover MDS
> >>
> >> We have ~2PB of cephfs data here, all of which is currently
> >> inaccessible, all and any advice gratefully received :)
> >>
> >
> > Add mds_cache_size and mds_cache_memory_limit to ceph.conf and set
> > them to very large values before starting mds. If mds does not crash,
> > restore the mds_cache_size and mds_cache_memory_limit  to their
> > original values (by admin socket) after mds becomes active for 10
> > seconds
> >
> > If mds still crash, try compile ceph-mds with following patch
> >
> > diff --git a/src/mds/CDir.cc b/src/mds/CDir.cc
> > index d3461fba2e..c2731e824c 100644
> > --- a/src/mds/CDir.cc
> > +++ b/src/mds/CDir.cc
> > @@ -508,6 +508,8 @@ void CDir::remove_dentry(CDentry *dn)
> >    // clean?
> >    if (dn->is_dirty())
> >      dn->mark_clean();
> > +  if (inode->is_stray())
> > +    dn->item_stray.remove_myself();
> >
> >    if (dn->state_test(CDentry::STATE_BOTTOMLRU))
> >      cache->bottom_lru.lru_remove(dn);
> >
> >
> >> best regards,
> >>
> >> Jake
> >> _______________________________________________
> >> ceph-users mailing list
> >> ceph-users@xxxxxxxxxxxxxx
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux