Re: Help needed! First MDs crashing, then MONs. How to recover ?

Patrick Donnelly <pdonnell@xxxxxxxxxx> · Thu, 30 May 2024 11:05:50 -0400

On Tue, May 28, 2024 at 8:54 AM Noe P. <ml@am-rand.berlin> wrote:
>
> Hi,
>
> we ran into a bigger problem today with our ceph cluster (Quincy,
> Alma8.9).
> We have 4 filesystems and a total of 6 MDs, the largest fs having
> two ranks assigned (i.e. one standby).
>
> Since we often have the problem of MDs lagging behind, we restart
> the MDs occasionally. Helps ususally, the standby taking over.

Please do not routinely restart MDS. Starting MDS recovery may only
multiply your problems (as it has).

> Today however, the restart didn't work and the rank 1 MDs started to
> crash for unclear reasons. Rank 0 seemed ok.

Figure out why! You might have tried increasing debugging on the mds:

ceph config set mds mds.X debug_mds 20
ceph config set mds mds.X debug_ms 1

> We decided at some point to go back to one rank by settings max_mds to 1.

Doing this will have no positive effect. I've made a tracker ticket so
that folks don't do this:

https://tracker.ceph.com/issues/66301

> Due to the permanent crashing, the rank1 didn't stop however, and at some
> point we set it to failed and the fs not joinable.

The monitors will not stop rank 1 until the cluster is healthy again.
What do you mean "set it to failed"? Setting the fs as not joinable
will mean it never becomes healthy again.

Please do not flail around with administration commands without
understanding the effects.

> At this point it looked like this:
>  fs_cluster - 716 clients
>  ==========
>  RANK  STATE     MDS        ACTIVITY     DNS    INOS   DIRS   CAPS
>   0    active  cephmd6a  Reqs:    0 /s  13.1M  13.1M  1419k  79.2k
>   1    failed
>        POOL         TYPE     USED  AVAIL
>  fs_cluster_meta  metadata  1791G  54.2T
>  fs_cluster_data    data     421T  54.2T
>
> with rank1 still being listed.
>
> The next attempt was to remove that failed
>
>    ceph mds rmfailed fs_cluster:1 --yes-i-really-mean-it
>
> which, after a short while brought down 3 out of five MONs.
> They keep crashing shortly after restart with stack traces like this:
>
>     ceph version 17.2.7 (b12291d110049b2f35e32e0de30d70e9a4c060d2) quincy (stable)
>     1: /lib64/libpthread.so.0(+0x12cf0) [0x7ff8813adcf0]
>     2: gsignal()
>     3: abort()
>     4: /lib64/libstdc++.so.6(+0x9009b) [0x7ff8809bf09b]
>     5: /lib64/libstdc++.so.6(+0x9654c) [0x7ff8809c554c]
>     6: /lib64/libstdc++.so.6(+0x965a7) [0x7ff8809c55a7]
>     7: /lib64/libstdc++.so.6(+0x96808) [0x7ff8809c5808]
>     8: /lib64/libstdc++.so.6(+0x92045) [0x7ff8809c1045]
>     9: (MDSMonitor::maybe_resize_cluster(FSMap&, int)+0xa9e) [0x55f05d9a5e8e]
>     10: (MDSMonitor::tick()+0x18a) [0x55f05d9b18da]
>     11: (MDSMonitor::on_active()+0x2c) [0x55f05d99a17c]
>     12: (Context::complete(int)+0xd) [0x55f05d76c56d]
>     13: (void finish_contexts<std::__cxx11::list<Context*, std::allocator<Context*> > >(ceph::common::CephContext*, std::__cxx11::list<Context*, std::allocator<Context*> >&, int)+0x9d) [0x55f05
>    d799d7d]
>     14: (Paxos::finish_round()+0x74) [0x55f05d8c5c24]
>     15: (Paxos::dispatch(boost::intrusive_ptr<MonOpRequest>)+0x41b) [0x55f05d8c7e5b]
>     16: (Monitor::dispatch_op(boost::intrusive_ptr<MonOpRequest>)+0x123e) [0x55f05d76a2ae]
>     17: (Monitor::_ms_dispatch(Message*)+0x406) [0x55f05d76a976]
>     18: (Dispatcher::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0x5d) [0x55f05d79b3ed]
>     19: (Messenger::ms_deliver_dispatch(boost::intrusive_ptr<Message> const&)+0x478) [0x7ff88367fed8]
>     20: (DispatchQueue::entry()+0x50f) [0x7ff88367d31f]
>     21: (DispatchQueue::DispatchThread::entry()+0x11) [0x7ff883747381]
>     22: /lib64/libpthread.so.0(+0x81ca) [0x7ff8813a31ca]
>     23: clone()
>     NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
>
> The MDSMonitor::maybe_resize_cluster somehow suggests a connection to the above MDs operation.

Yes, you've made a mess of things. I assume you ignored this warning:

"WARNING: this can make your filesystem inaccessible! Add
--yes-i-really-mean-it if you are sure you wish to continue."

:(

> Does anyone have an idea how to get this cluster back together again ? Like manually fixing the
> MD ranks ?

You will probably need to bring the file system down but you've
clearly caused the mons to hit an assert where this will be difficult.
You need to increase debugging on the mons (in their
/etc/ceph/ceph.conf):

[mon]
   debug mon = 20
   debug ms = 1

and share the logs on this list or via ceph-post-file.

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Red Hat Partner Engineer
IBM, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx