On Tue, May 28, 2024 at 8:54 AM Noe P. <ml@am-rand.berlin> wrote: > > Hi, > > we ran into a bigger problem today with our ceph cluster (Quincy, > Alma8.9). > We have 4 filesystems and a total of 6 MDs, the largest fs having > two ranks assigned (i.e. one standby). > > Since we often have the problem of MDs lagging behind, we restart > the MDs occasionally. Helps ususally, the standby taking over. Please do not routinely restart MDS. Starting MDS recovery may only multiply your problems (as it has). > Today however, the restart didn't work and the rank 1 MDs started to > crash for unclear reasons. Rank 0 seemed ok. Figure out why! You might have tried increasing debugging on the mds: ceph config set mds mds.X debug_mds 20 ceph config set mds mds.X debug_ms 1 > We decided at some point to go back to one rank by settings max_mds to 1. Doing this will have no positive effect. I've made a tracker ticket so that folks don't do this: https://tracker.ceph.com/issues/66301 > Due to the permanent crashing, the rank1 didn't stop however, and at some > point we set it to failed and the fs not joinable. The monitors will not stop rank 1 until the cluster is healthy again. What do you mean "set it to failed"? Setting the fs as not joinable will mean it never becomes healthy again. Please do not flail around with administration commands without understanding the effects. > At this point it looked like this: > fs_cluster - 716 clients > ========== > RANK STATE MDS ACTIVITY DNS INOS DIRS CAPS > 0 active cephmd6a Reqs: 0 /s 13.1M 13.1M 1419k 79.2k > 1 failed > POOL TYPE USED AVAIL > fs_cluster_meta metadata 1791G 54.2T > fs_cluster_data data 421T 54.2T > > with rank1 still being listed. > > The next attempt was to remove that failed > > ceph mds rmfailed fs_cluster:1 --yes-i-really-mean-it > > which, after a short while brought down 3 out of five MONs. > They keep crashing shortly after restart with stack traces like this: > > ceph version 17.2.7 (b12291d110049b2f35e32e0de30d70e9a4c060d2) quincy (stable) > 1: /lib64/libpthread.so.0(+0x12cf0) [0x7ff8813adcf0] > 2: gsignal() > 3: abort() > 4: /lib64/libstdc++.so.6(+0x9009b) [0x7ff8809bf09b] > 5: /lib64/libstdc++.so.6(+0x9654c) [0x7ff8809c554c] > 6: /lib64/libstdc++.so.6(+0x965a7) [0x7ff8809c55a7] > 7: /lib64/libstdc++.so.6(+0x96808) [0x7ff8809c5808] > 8: /lib64/libstdc++.so.6(+0x92045) [0x7ff8809c1045] > 9: (MDSMonitor::maybe_resize_cluster(FSMap&, int)+0xa9e) [0x55f05d9a5e8e] > 10: (MDSMonitor::tick()+0x18a) [0x55f05d9b18da] > 11: (MDSMonitor::on_active()+0x2c) [0x55f05d99a17c] > 12: (Context::complete(int)+0xd) [0x55f05d76c56d] > 13: (void finish_contexts<std::__cxx11::list<Context*, std::allocator<Context*> > >(ceph::common::CephContext*, std::__cxx11::list<Context*, std::allocator<Context*> >&, int)+0x9d) [0x55f05 > d799d7d] > 14: (Paxos::finish_round()+0x74) [0x55f05d8c5c24] > 15: (Paxos::dispatch(boost::intrusive_ptr<MonOpRequest>)+0x41b) [0x55f05d8c7e5b] > 16: (Monitor::dispatch_op(boost::intrusive_ptr<MonOpRequest>)+0x123e) [0x55f05d76a2ae] > 17: (Monitor::_ms_dispatch(Message*)+0x406) [0x55f05d76a976] > 18: (Dispatcher::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0x5d) [0x55f05d79b3ed] > 19: (Messenger::ms_deliver_dispatch(boost::intrusive_ptr<Message> const&)+0x478) [0x7ff88367fed8] > 20: (DispatchQueue::entry()+0x50f) [0x7ff88367d31f] > 21: (DispatchQueue::DispatchThread::entry()+0x11) [0x7ff883747381] > 22: /lib64/libpthread.so.0(+0x81ca) [0x7ff8813a31ca] > 23: clone() > NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. > > The MDSMonitor::maybe_resize_cluster somehow suggests a connection to the above MDs operation. Yes, you've made a mess of things. I assume you ignored this warning: "WARNING: this can make your filesystem inaccessible! Add --yes-i-really-mean-it if you are sure you wish to continue." :( > Does anyone have an idea how to get this cluster back together again ? Like manually fixing the > MD ranks ? You will probably need to bring the file system down but you've clearly caused the mons to hit an assert where this will be difficult. You need to increase debugging on the mons (in their /etc/ceph/ceph.conf): [mon] debug mon = 20 debug ms = 1 and share the logs on this list or via ceph-post-file. -- Patrick Donnelly, Ph.D. He / Him / His Red Hat Partner Engineer IBM, Inc. GPG: 19F28A586F808C2402351B93C3301A3E258DD79D _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx