Hello,
We cannot start the mds service after running some delete commands on large folders (100k+ files).
This is what the crash message looks like right after a start-up attempt:
-2> 2017-05-17 08:36:03.071272 7fcc87a61700 1 -- 10.103.213.182:6803/14366 <== osd.2 10.103.213.1:6811/3384506 1 ==== osd_op_reply(92 10007e5ca9f.00000000 [delete] v0'0 uv911507 _ondisk_ = -2 ((2) No such file or directory)) v7 ==== 140+0+0 (1847967201 0 0) 0x55744151dc80 con 0x5574414e9d80
-1> 2017-05-17 08:36:03.071430 7fcc8765d700 1 -- 10.103.213.182:6803/14366 <== osd.21 10.103.213.5:6805/4030475 1 ==== osd_op_reply(90 10007e5cab8.00000000 [delete] v0'0 uv1270452 _ondisk_ = -2 ((2) No such file or directory)) v7 ==== 140+0+0 (2193063204 0 0) 0x55744156a000 con 0x5574414e8700
0> 2017-05-17 08:36:03.081734 7fcc97235700 -1 mds/StrayManager.cc: In function 'void StrayManager::eval_remote_stray(CDentry*, CDentry*)' thread 7fcc97235700 time 2017-05-17 08:36:03.080128
mds/StrayManager.cc: 673: FAILED assert(stray_in->inode.nlink >= 1)
ceph version 10.2.3 (ecc23778eb545d8dd55e2e4735b53cc93f92e65b)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0x557434e58adb]
2: (StrayManager::eval_remote_stray(CDentry*, CDentry*)+0x466) [0x557434bcfdf6]
3: (StrayManager::__eval_stray(CDentry*, bool)+0x4cd) [0x557434bd47ad]
4: (StrayManager::eval_stray(CDentry*, bool)+0x1e) [0x557434bd509e]
5: (MDCache::scan_stray_dir(dirfrag_t)+0x14e) [0x557434b2bace]
6: (MDCache::populate_mydir()+0x807) [0x557434b994b7]
7: (MDCache::open_root()+0xdc) [0x557434b99e0c]
8: (MDSInternalContextBase::complete(int)+0x1db) [0x557434cc2acb]
9: (MDSRank::_advance_queues()+0x495) [0x557434a960c5]
10: (MDSRank::ProgressThread::entry()+0x4a) [0x557434a963ea]
11: (()+0x8182) [0x7fcca1536182]
12: (clone()+0x6d) [0x7fcc9fa8d47d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
--- logging levels ---
0/ 5 none
0/ 1 lockdep
0/ 1 context
1/ 1 crush
0/ 0 mds
1/ 5 mds_balancer
1/ 5 mds_locker
1/ 5 mds_log
1/ 5 mds_log_expire
1/ 5 mds_migrator
0/ 1 buffer
0/ 1 timer
0/ 1 filer
0/ 1 striper
0/ 1 objecter
0/ 5 rados
0/ 5 rbd
0/ 5 rbd_mirror
0/ 5 rbd_replay
0/ 5 journaler
0/ 5 objectcacher
0/ 5 client
0/ 5 osd
0/ 5 optracker
0/ 5 objclass
1/ 3 filestore
1/ 3 journal
0/ 5 ms
1/ 5 mon
0/10 monc
1/ 5 paxos
0/ 5 tp
1/ 5 auth
1/ 5 crypto
1/ 1 finisher
1/ 5 heartbeatmap
1/ 5 perfcounter
1/ 5 rgw
1/10 civetweb
1/ 5 javaclient
1/ 5 asok
1/ 1 throttle
0/ 0 refs
1/ 5 xio
1/ 5 compressor
1/ 5 newstore
1/ 5 bluestore
1/ 5 bluefs
1/ 3 bdev
1/ 5 kstore
4/ 5 rocksdb
4/ 5 leveldb
1/ 5 kinetic
1/ 5 fuse
-2/-2 (syslog threshold)
-1/-1 (stderr threshold)
max_recent 10000
max_new 1000
log_file /var/log/ceph/ceph2-mds.ceph2-mds-2.log
--- end dump of recent events ---
2017-05-17 08:36:03.087895 7fcc97235700 -1 *** Caught signal (Aborted) **
in thread 7fcc97235700 thread_name:mds_rank_progr
I would appreciate any hints about how to aproach a recovery attempt.
Thank you,
Simion Marius Rad
Sr.SysAdmin
PropertyShark.com
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com