On Thu, Mar 29, 2018 at 8:16 AM, Zhang Qiang <dotslash.lu@xxxxxxxxx> wrote: > Hi, > > Ceph version 10.2.3. After a power outage, I tried to start the MDS > deamons, but they stuck forever replaying journals, I had no idea why > they were taking that long, because this is just a small cluster for > testing purpose with only hundreds MB data. I restarted them, and the > error below was encountered. Usually if an MDS is stuck in replay, it's because it's waiting for the OSDs to service the reads of the journal. Are all your PGs up and healthy? > > Any chance I can restore them? > > Mar 28 14:20:30 node01 systemd: Started Ceph metadata server daemon. > Mar 28 14:20:30 node01 systemd: Starting Ceph metadata server daemon... > Mar 28 14:20:30 node01 ceph-mds: 2018-03-28 14:20:30.796255 > 7f0150c8c180 -1 deprecation warning: MDS id 'mds.0' is invalid and > will be forbidden in a future version. MDS names may not start with a > numeric digit. If you're really using "0" as an MDS name, now would be a good time to fix that -- most people use a hostname or something like that. The reason that numeric MDS names are invalid is that it makes commands like "ceph mds fail 0" ambiguous (do we mean the name 0 or the rank 0?). > Mar 28 14:20:30 node01 ceph-mds: starting mds.0 at :/0 > Mar 28 14:20:30 node01 ceph-mds: ./mds/MDSMap.h: In function 'const > entity_inst_t MDSMap::get_inst(mds_rank_t)' thread 7f014ac6c700 time > 2018-03-28 14:20:30.942480 > Mar 28 14:20:30 node01 ceph-mds: ./mds/MDSMap.h: 582: FAILED assert(up.count(m)) > Mar 28 14:20:30 node01 ceph-mds: ceph version 10.2.3 > (ecc23778eb545d8dd55e2e4735b53cc93f92e65b) > Mar 28 14:20:30 node01 ceph-mds: 1: (ceph::__ceph_assert_fail(char > const*, char const*, int, char const*)+0x85) [0x7f01512aba45] > Mar 28 14:20:30 node01 ceph-mds: 2: (MDSMap::get_inst(int)+0x20f) > [0x7f0150ee5e3f] > Mar 28 14:20:30 node01 ceph-mds: 3: > (MDSRankDispatcher::handle_mds_map(MMDSMap*, MDSMap*)+0x7b9) > [0x7f0150ed6e49] This is a weird assertion. I can't see how it could be reached :-/ John > Mar 28 14:20:30 node01 ceph-mds: 4: > (MDSDaemon::handle_mds_map(MMDSMap*)+0xe3d) [0x7f0150eb396d] > Mar 28 14:20:30 node01 ceph-mds: 5: > (MDSDaemon::handle_core_message(Message*)+0x7b3) [0x7f0150eb4eb3] > Mar 28 14:20:30 node01 ceph-mds: 6: > (MDSDaemon::ms_dispatch(Message*)+0xdb) [0x7f0150eb514b] > Mar 28 14:20:30 node01 ceph-mds: 7: (DispatchQueue::entry()+0x78a) > [0x7f01513ad4aa] > Mar 28 14:20:30 node01 ceph-mds: 8: > (DispatchQueue::DispatchThread::entry()+0xd) [0x7f015129098d] > Mar 28 14:20:30 node01 ceph-mds: 9: (()+0x7dc5) [0x7f0150095dc5] > Mar 28 14:20:30 node01 ceph-mds: 10: (clone()+0x6d) [0x7f014eb61ced] > Mar 28 14:20:30 node01 ceph-mds: NOTE: a copy of the executable, or > `objdump -rdS <executable>` is needed to interpret this. > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com