Error in MDS (laggy or creshed)

Alfredo Daniel Rezinovsky <alfrenovsky@xxxxxxxxx> · Sun, 7 Oct 2018 20:02:21 -0300

Cluster with 4 nodes

node 1: 2 HDDs
node 2: 3 HDDs
node 3: 3 HDDs
node 4: 2 HDDs

After a problem with upgrade from 13.2.1 to 13.2.2 (I restarted the 
nodes 1 at a time)

I upgraded with ubuntu apt-get upgrade. I had 1 acvive mds at a time 
when did the upgrade.

All MDSs stopped working

Status shows 1 crashed and no one in standby.

If I restart an MDS status shows replay then crash with this log output:

 ceph version 13.2.2 (02899bfda814146b021136e9d8e80eba494e1126) mimic 
(stable)
1: (()+0x3f5480) [0x555de8a51480]
2: (()+0x12890) [0x7f6e4cb41890]
3: (gsignal()+0xc7) [0x7f6e4bc39e97]
4: (abort()+0x141) [0x7f6e4bc3b801]
5: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
const*)+0x250) [0x7f6e4d22a710]
6: (()+0x26c787) [0x7f6e4d22a787]
7: (EMetaBlob::replay(MDSRank*, LogSegment*, MDSlaveUpdate*)+0x5f4b) 
[0x555de8a3c83b]
8: (EUpdate::replay(MDSRank*)+0x39) [0x555de8a3dd79]
9: (MDLog::_replay_thread()+0x864) [0x555de89e6e04]
10: (MDLog::ReplayThread::entry()+0xd) [0x555de8784ebd]
11: (()+0x76db) [0x7f6e4cb366db]
12: (clone()+0x3f) [0x7f6e4bd1c88f]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed 
to interpret this

journal reports OK

Now im trying:

 cephfs-data-scan scan_extents cephfs_data

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com