cephfs - cannot start MDS

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




I have a semi-corrupted cephfs filesystem (most directories are OK, but a few are broken).  Trying to read or delete anything from the broken directories causes the MDS servers to crash, I have followed all of the disaster recovery steps, but I still cannot keep the MDS servers up and there are still corrupt directories in the FS.

I can usually get the MDS to come back if I run "cephfs-data-scan scan_links" a couple of times, but it's not consistent.  Any suggestions on how to resolve this issue?


The mds crashes with the following traces in the log:

  -401> 2019-09-11 13:43:24.768 7fa71112b700 -1 log_channel(cluster) log [ERR] : bad backtrace on directory inode 0x1001476c473
  -401> 2019-09-11 13:43:24.768 7fa71112b700  0 log_channel(cluster) do_log log to syslog
  -401> 2019-09-11 13:43:24.768 7fa71112b700 -1 log_channel(cluster) log [ERR] : bad backtrace on directory inode 0x100146dfe3b
  -401> 2019-09-11 13:43:24.768 7fa71112b700  0 log_channel(cluster) do_log log to syslog
  -401> 2019-09-11 13:43:24.768 7fa719aeb700  1 -- 10.10.30.115:6800/1442163404 <== osd.139 10.10.30.51:6800/142614 1 ==== osd_op_reply(84 100148d4cf3.00000000 [omap-get-header,omap-get-vals,getxattr (94)] v0'0 uv35566 _ondisk_ = 0) v8 ==== 248+0+5722 (1809603995 0 3125985462) 0x8b5c340 con 0x5cc7800
  -401> 2019-09-11 13:43:24.772 7fa71aaed700  1 -- 10.10.30.115:6800/1442163404 <== osd.76 10.10.30.55:6833/15548 1 ==== osd_op_reply(80 100146dfe3d.00000000 [omap-get-header,omap-get-vals,getxattr] v0'0 uv37154 _ondisk_ = 0) v8 ==== 248+0+3667 (486108846 0 420775557) 0x2d30700 con 0x5bb8300
  -401> 2019-09-11 13:43:24.772 7fa71112b700 -1 log_channel(cluster) log [ERR] : bad backtrace on directory inode 0x100146dfe3d

....

  -401> 2019-09-11 13:43:25.844 7fa71292e700 -1 /build/ceph-13.2.6/src/mds/Server.cc: In function 'void Server::_unlink_local(MDRequestRef&, CDentry*, CDentry*)' thread 7fa71292e700 time 2019-09-11 13:43:25.843472
/build/ceph-13.2.6/src/mds/Server.cc: 6599: FAILED assert(in->first <= straydn->first)

 ceph version 13.2.6 (7b695f835b03642f85998b2ae7b6dd093d9fbce4) mimic (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x14e) [0x7fa71f7a597e]
 2: (()+0x2fab07) [0x7fa71f7a5b07]
 3: (Server::_unlink_local(boost::intrusive_ptr<MDRequestImpl>&, CDentry*, CDentry*)+0x15e8) [0x548fa8]
 4: (Server::handle_client_unlink(boost::intrusive_ptr<MDRequestImpl>&)+0x961) [0x549991]
 5: (Server::handle_client_request(MClientRequest*)+0x49b) [0x563beb]
 6: (Server::dispatch(Message*)+0x2fb) [0x5678cb]
 7: (MDSRank::handle_deferrable_message(Message*)+0x434) [0x4da3c4]
 8: (MDSRank::_dispatch(Message*, bool)+0x89b) [0x4f17db]
 9: (MDSRank::retry_dispatch(Message*)+0x12) [0x4f1ec2]
 10: (MDSInternalContextBase::complete(int)+0x67) [0x74faf7]
 11: (MDSRank::_advance_queues()+0xf1) [0x4f0781]
 12: (MDSRank::ProgressThread::entry()+0x43) [0x4f0e03]
 13: (()+0x76ba) [0x7fa71f0216ba]
 14: (clone()+0x6d) [0x7fa71e84a41d]

  -401> 2019-09-11 13:43:25.844 7fa719aeb700  1 -- 10.10.30.115:6800/1442163404 <== osd.49 10.10.30.56:6838/15753 3 ==== osd_op_reply(90 600.00000000 [omap-get-header,omap-get-vals,getxattr (62)] v0'0 uv98420 _ondisk_ = 0) v8 ==== 240+0+437012 (2786733188 0 4243776564) 0x8b5e080 con 0x3addc00
  -401> 2019-09-11 13:43:25.848 7fa71292e700 -1 *** Caught signal (Aborted) **
 in thread 7fa71292e700 thread_name:mds_rank_progr

 ceph version 13.2.6 (7b695f835b03642f85998b2ae7b6dd093d9fbce4) mimic (stable)
 1: (()+0x11390) [0x7fa71f02b390]
 2: (gsignal()+0x38) [0x7fa71e778428]
 3: (abort()+0x16a) [0x7fa71e77a02a]
 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x256) [0x7fa71f7a5a86]
 5: (()+0x2fab07) [0x7fa71f7a5b07]
 6: (Server::_unlink_local(boost::intrusive_ptr<MDRequestImpl>&, CDentry*, CDentry*)+0x15e8) [0x548fa8]
 7: (Server::handle_client_unlink(boost::intrusive_ptr<MDRequestImpl>&)+0x961) [0x549991]
 8: (Server::handle_client_request(MClientRequest*)+0x49b) [0x563beb]
 9: (Server::dispatch(Message*)+0x2fb) [0x5678cb]
 10: (MDSRank::handle_deferrable_message(Message*)+0x434) [0x4da3c4]
 11: (MDSRank::_dispatch(Message*, bool)+0x89b) [0x4f17db]
 12: (MDSRank::retry_dispatch(Message*)+0x12) [0x4f1ec2]
 13: (MDSInternalContextBase::complete(int)+0x67) [0x74faf7]
 14: (MDSRank::_advance_queues()+0xf1) [0x4f0781]
 15: (MDSRank::ProgressThread::entry()+0x43) [0x4f0e03]
 16: (()+0x76ba) [0x7fa71f0216ba]
 17: (clone()+0x6d) [0x7fa71e84a41d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx

[Index of Archives]     [CEPH Users]     [Ceph Devel]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux