Re: MDS crash (Mimic 13.2.2 / 13.2.4 ) elist.h: 39: FAILED assert(!is_on_list())

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Zheng,

Sorry - I've just re-read your email and saw your instruction to restore
the mds_cache_size and mds_cache_memory_limit to original values if the
MDS does not crash - I have now done this...

thanks again for your help,

best regards,

Jake

On 2/11/19 12:01 PM, Jake Grimmett wrote:
> Hi Zheng,
> 
> Many, many thanks for your help...
> 
> Your suggestion of setting large values for mds_cache_size and
> mds_cache_memory_limit stopped our MDS crashing :)
> 
> The values in ceph.conf are now:
> 
> mds_cache_size = 8589934592
> mds_cache_memory_limit = 17179869184
> 
> Should these values be left in our configuration?
> 
> again thanks for the assistance,
> 
> Jake
> 
> On 2/11/19 8:17 AM, Yan, Zheng wrote:
>> On Sat, Feb 9, 2019 at 12:36 AM Jake Grimmett <jog@xxxxxxxxxxxxxxxxx> wrote:
>>>
>>> Dear All,
>>>
>>> Unfortunately the MDS has crashed on our Mimic cluster...
>>>
>>> First symptoms were rsync giving:
>>> "No space left on device (28)"
>>> when trying to rename or delete
>>>
>>> This prompted me to try restarting the MDS, as it reported laggy.
>>>
>>> Restarting the MDS, shows this as error in the log before the crash:
>>>
>>> elist.h: 39: FAILED assert(!is_on_list())
>>>
>>> A full MDS log showing the crash is here:
>>>
>>> http://p.ip.fi/iWlz
>>>
>>> I've tried upgrading the cluster to 13.2.4, but the MDS still crashes...
>>>
>>> The cluster has 10 nodes, 254 OSD's, uses EC for the data, 3x
>>> replication for MDS. We have a single active MDS, with two failover MDS
>>>
>>> We have ~2PB of cephfs data here, all of which is currently
>>> inaccessible, all and any advice gratefully received :)
>>>
>>
>> Add mds_cache_size and mds_cache_memory_limit to ceph.conf and set
>> them to very large values before starting mds. If mds does not crash,
>> restore the mds_cache_size and mds_cache_memory_limit  to their
>> original values (by admin socket) after mds becomes active for 10
>> seconds
>>
>> If mds still crash, try compile ceph-mds with following patch
>>
>> diff --git a/src/mds/CDir.cc b/src/mds/CDir.cc
>> index d3461fba2e..c2731e824c 100644
>> --- a/src/mds/CDir.cc
>> +++ b/src/mds/CDir.cc
>> @@ -508,6 +508,8 @@ void CDir::remove_dentry(CDentry *dn)
>>    // clean?
>>    if (dn->is_dirty())
>>      dn->mark_clean();
>> +  if (inode->is_stray())
>> +    dn->item_stray.remove_myself();
>>
>>    if (dn->state_test(CDentry::STATE_BOTTOMLRU))
>>      cache->bottom_lru.lru_remove(dn);
>>
>>
>>> best regards,
>>>
>>> Jake
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@xxxxxxxxxxxxxx
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux