Help with seemingly damaged MDS rank(?)

Skaag Argonius <skaag@xxxxxxxxx> · Sat, 14 May 2016 09:00:54 -0700

We have a rather urgent situation, and I need help doing one of two things:

1. Fix the MDS and regain a working cluster (ideal)
2. Find a way to extract the contents so I can move it to a new cluster (need to do this anyway)

We have 4 physical storage machines: stor1, stor2, stor3, stor4
The MDS is on stor2.

I'm including as much info as I can below.

-------8<-----------------------------------------------------------------------------

Cluster status:

[root@stor1 /]# ceph -s
    cluster 926a1578-48e8-4559-b1b5-1f1a9b8e7566
     health HEALTH_ERR
            2 pgs backfill
            24 pgs backfilling
            122 pgs degraded
            14 pgs stale
            122 pgs stuck degraded
            14 pgs stuck stale
            122 pgs stuck unclean
            122 pgs stuck undersized
            122 pgs undersized
            recovery 26146044/127710264 objects degraded (20.473%)
            recovery 21267236/127710264 objects misplaced (16.653%)
            mds ranks 0,2 have failed
            mds cluster is degraded
            mds gentoo-backup is laggy
     monmap e15: 1 mons at {0=192.168.202.101:6789/0}
            election epoch 1, quorum 0 0
     mdsmap e11373: 1/2/2 up {1=gentoo-backup=up:replay(laggy or crashed)}, 2 failed
     osdmap e12755: 9 osds: 6 up, 6 in; 26 remapped pgs
      pgmap v22065162: 224 pgs, 4 pools, 101 TB data, 62358 kobjects
            169 TB used, 50604 GB / 218 TB avail
            26146044/127710264 objects degraded (20.473%)
            21267236/127710264 objects misplaced (16.653%)
                  96 active+undersized+degraded
                  87 active+clean
                  24 active+undersized+degraded+remapped+backfilling
                  14 stale+active+clean
                   2 active+undersized+degraded+remapped+wait_backfill
                   1 active+clean+scrubbing+deep
recovery io 100993 kB/s, 68 objects/s

-------8<-----------------------------------------------------------------------------

History:

We had a single MDS, things were working fine. One of our sysadmins changed permissions in some sensitive folder, apparently (I don't have all the details).

As a result the MDS went down. He then tried to bring a backup mds into the mix (called 'gentoo-backup'), but I think that MDS was an exact replica of the first MDS, so obviously it wasn't going to work well...

We tried to kick the backup MDS out, tried to mark it as failed, tried to make it inactive, tried to reduce max_mds to 1, nothing helped.

-------8<-----------------------------------------------------------------------------

MDS Crash logs:

2016-05-13 15:19:27.447753 7f7a82580700 -1 log_channel(cluster) log [ERR] : corrupt journal event at 4194304~472 / 10426264
2016-05-13 15:19:27.448901 7f7a82580700 -1 mds/Beacon.cc: In function 'void Beacon::notify_health(const MDS*)' thread 7f7a82580700 time 2016-05-13 15:19:27.447829
mds/Beacon.cc: 291: FAILED assert(mds->mds_lock.is_locked_by_me())

 ceph version 9.0.2 (be422c8f5b494c77ebcf0f7b95e5d728ecacb7f0)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x85) [0x98b645]
 2: (Beacon::notify_health(MDS const*)+0x74) [0x5d7d54]
 3: (MDS::damaged()+0x42) [0x5a9bd2]
 4: (MDLog::_replay_thread()+0x1c02) [0x803e62]
 5: (MDLog::ReplayThread::entry()+0xd) [0x5c88bd]
 6: (()+0x7df5) [0x7f7a8c564df5]
 7: (clone()+0x6d) [0x7f7a8b24b1ad]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

-------8<-----------------------------------------------------------------------------

More from MDS log:

2016-05-13 15:19:27.451896 7f7a82580700 -1 *** Caught signal (Aborted) **
 in thread 7f7a82580700

 ceph version 9.0.2 (be422c8f5b494c77ebcf0f7b95e5d728ecacb7f0)
 1: /usr/bin/ceph-mds() [0x8916b2]
 2: (()+0xf130) [0x7f7a8c56c130]
 3: (gsignal()+0x37) [0x7f7a8b18a5d7]
 4: (abort()+0x148) [0x7f7a8b18bcc8]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7f7a8ba8e9b5]
 6: (()+0x5e926) [0x7f7a8ba8c926]
 7: (()+0x5e953) [0x7f7a8ba8c953]
 8: (()+0x5eb73) [0x7f7a8ba8cb73]
 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x27a) [0x98b83a]
 10: (Beacon::notify_health(MDS const*)+0x74) [0x5d7d54]
 11: (MDS::damaged()+0x42) [0x5a9bd2]
 12: (MDLog::_replay_thread()+0x1c02) [0x803e62]
 13: (MDLog::ReplayThread::entry()+0xd) [0x5c88bd]
 14: (()+0x7df5) [0x7f7a8c564df5]
 15: (clone()+0x6d) [0x7f7a8b24b1ad]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

...
...
...

  log_file /var/log/ceph/ceph-mds.stor2.log
--- end dump of recent events ---
2016-05-13 15:54:53.300449 7f1e1f47c7c0  0 ceph version 9.0.2 (be422c8f5b494c77ebcf0f7b95e5d728ecacb7f0), process ceph-mds, pid 3935832
2016-05-13 15:54:53.324083 7f1e1f47c7c0 -1 mds.-1.0 log_to_monitors {default=true}
2016-05-13 15:54:53.572405 7f1e19aaf700  1 mds.-1.0 handle_mds_map standby
2016-05-13 15:56:17.247033 7f1e151a5700 -1 mds.-1.0 *** got signal Terminated ***
2016-05-13 15:56:17.250635 7f1e151a5700  1 mds.-1.0 suicide.  wanted down:dne, now up:standby
2016-05-13 15:56:18.405716 7f6ba594c7c0  0 ceph version 9.0.2 (be422c8f5b494c77ebcf0f7b95e5d728ecacb7f0), process ceph-mds, pid 3940158
2016-05-13 15:56:18.428399 7f6ba594c7c0 -1 mds.-1.0 log_to_monitors {default=true}
2016-05-13 15:56:18.984429 7f6b9ff7f700  1 mds.-1.0 handle_mds_map standby
2016-05-13 15:56:59.516357 7f6b9ff7f700  1 mds.-1.0 handle_mds_map standby
2016-05-13 15:57:02.637735 7f6b9ff7f700  1 mds.-1.0 handle_mds_map standby
2016-05-13 15:57:05.774427 7f6b9ff7f700  1 mds.-1.0 handle_mds_map standby

-------8<-----------------------------------------------------------------------------

Last portion from MDS log on stor2 (from right now):

2016-05-13 19:10:15.609077 7f0e77eab780  0 ceph version 9.1.0 (3be81ae6cf17fcf689cd6f187c4615249fea4f61), process ceph-mds, pid 4028429
2016-05-13 19:10:16.464975 7f0e726d8700  1 mds.stor2 handle_mds_map standby
2016-05-13 19:10:16.467750 7f0e726d8700  1 mds.0.267 handle_mds_map i am now mds.0.267
2016-05-13 19:10:16.467757 7f0e726d8700  1 mds.0.267 handle_mds_map state change up:boot --> up:replay
2016-05-13 19:10:16.467778 7f0e726d8700  1 mds.0.267 replay_start
2016-05-13 19:10:16.467784 7f0e726d8700  1 mds.0.267  recovery set is 1
2016-05-13 19:10:16.467789 7f0e726d8700  1 mds.0.267  waiting for osdmap 12618 (which blacklists prior instance)
2016-05-13 19:10:16.470750 7f0e6d4cc700  0 mds.0.cache creating system inode with ino:100
2016-05-13 19:10:16.470880 7f0e6d4cc700  0 mds.0.cache creating system inode with ino:1
2016-05-13 19:10:16.878435 7f0e6c1c7700  1 mds.0.267 replay_done
2016-05-13 19:10:16.878454 7f0e6c1c7700  1 mds.0.267 making mds journal writeable
2016-05-13 19:10:17.681414 7f0e726d8700  1 mds.0.267 handle_mds_map i am now mds.0.267
2016-05-13 19:10:17.681419 7f0e726d8700  1 mds.0.267 handle_mds_map state change up:replay --> up:resolve
2016-05-13 19:10:17.681431 7f0e726d8700  1 mds.0.267 resolve_start
2016-05-13 19:10:17.681432 7f0e726d8700  1 mds.0.267 reopen_log
2016-05-13 19:11:45.822580 7f0e726d8700  1 mds.0.cache handle_mds_failure mds.1 : recovery peers are 1
2016-05-13 19:12:17.462003 7f0e726d8700  1 mds.0.cache handle_mds_failure mds.1 : recovery peers are 1
2016-05-13 19:12:31.705710 7f0e726d8700  1 mds.0.cache handle_mds_failure mds.1 : recovery peers are 1
2016-05-13 19:12:51.942145 7f0e726d8700  1 mds.0.cache handle_mds_failure mds.1 : recovery peers are 1
2016-05-13 19:13:59.720073 7f0e6edd0700 -1 mds.stor2 *** got signal Terminated ***
2016-05-13 19:13:59.720130 7f0e6edd0700  1 mds.stor2 suicide.  wanted state up:resolve
2016-05-13 19:13:59.722837 7f0e6edd0700  1 mds.0.267 shutdown: shutting down rank 0

-------8<-----------------------------------------------------------------------------

My own experience with storage systems is limited to NFS, GlusterFS and various exotic Fuse based filesystems. I came into this startup with ceph already installed the way it is (I inherited it), so be gentle with me please :-)

Thanks in advance for any help/advice you can give!

Skaag

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com