Re: ceph-mds crash - jewel 10.2.3

John Spray <jspray@xxxxxxxxxx> · Wed, 17 May 2017 14:07:47 +0100

On Wed, May 17, 2017 at 1:44 PM, Simion Marius Rad <simarad@xxxxxxxxx> wrote:
> Hello,
>
> We cannot start the mds service after running some delete commands on large
> folders (100k+ files).

You've posted previously about damaged detected on your MDS, and about
corrupted XFS filesystems on your OSDs -- is this the same
cluster/filesystem, or a fresh one?

John

> This is what the crash message looks like right after a start-up attempt:
>
>     -2> 2017-05-17 08:36:03.071272 7fcc87a61700  1 --
> 10.103.213.182:6803/14366 <== osd.2 10.103.213.1:6811/3384506 1 ====
> osd_op_reply(92 10007e5ca9f.00000000 [delete] v0'0 uv911507 ondisk = -2 ((2)
> No such file or directory)) v7 ==== 140+0+0 (1847967201 0 0) 0x55744151dc80
> con 0x5574414e9d80
>     -1> 2017-05-17 08:36:03.071430 7fcc8765d700  1 --
> 10.103.213.182:6803/14366 <== osd.21 10.103.213.5:6805/4030475 1 ====
> osd_op_reply(90 10007e5cab8.00000000 [delete] v0'0 uv1270452 ondisk = -2
> ((2) No such file or directory)) v7 ==== 140+0+0 (2193063204 0 0)
> 0x55744156a000 con 0x5574414e8700
>      0> 2017-05-17 08:36:03.081734 7fcc97235700 -1 mds/StrayManager.cc: In
> function 'void StrayManager::eval_remote_stray(CDentry*, CDentry*)' thread
> 7fcc97235700 time 2017-05-17 08:36:03.080128
> mds/StrayManager.cc: 673: FAILED assert(stray_in->inode.nlink >= 1)
>
>  ceph version 10.2.3 (ecc23778eb545d8dd55e2e4735b53cc93f92e65b)
>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x8b) [0x557434e58adb]
>  2: (StrayManager::eval_remote_stray(CDentry*, CDentry*)+0x466)
> [0x557434bcfdf6]
>  3: (StrayManager::__eval_stray(CDentry*, bool)+0x4cd) [0x557434bd47ad]
>  4: (StrayManager::eval_stray(CDentry*, bool)+0x1e) [0x557434bd509e]
>  5: (MDCache::scan_stray_dir(dirfrag_t)+0x14e) [0x557434b2bace]
>  6: (MDCache::populate_mydir()+0x807) [0x557434b994b7]
>  7: (MDCache::open_root()+0xdc) [0x557434b99e0c]
>  8: (MDSInternalContextBase::complete(int)+0x1db) [0x557434cc2acb]
>  9: (MDSRank::_advance_queues()+0x495) [0x557434a960c5]
>  10: (MDSRank::ProgressThread::entry()+0x4a) [0x557434a963ea]
>  11: (()+0x8182) [0x7fcca1536182]
>  12: (clone()+0x6d) [0x7fcc9fa8d47d]
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to
> interpret this.
>
> --- logging levels ---
>    0/ 5 none
>    0/ 1 lockdep
>    0/ 1 context
>    1/ 1 crush
>    0/ 0 mds
>    1/ 5 mds_balancer
>    1/ 5 mds_locker
>    1/ 5 mds_log
>    1/ 5 mds_log_expire
>    1/ 5 mds_migrator
>    0/ 1 buffer
>    0/ 1 timer
>    0/ 1 filer
>    0/ 1 striper
>    0/ 1 objecter
>    0/ 5 rados
>    0/ 5 rbd
>    0/ 5 rbd_mirror
>    0/ 5 rbd_replay
>    0/ 5 journaler
>    0/ 5 objectcacher
>    0/ 5 client
>    0/ 5 osd
>    0/ 5 optracker
>    0/ 5 objclass
>    1/ 3 filestore
>    1/ 3 journal
>    0/ 5 ms
>    1/ 5 mon
>    0/10 monc
>    1/ 5 paxos
>    0/ 5 tp
>    1/ 5 auth
>    1/ 5 crypto
>    1/ 1 finisher
>    1/ 5 heartbeatmap
>    1/ 5 perfcounter
>    1/ 5 rgw
>    1/10 civetweb
>    1/ 5 javaclient
>    1/ 5 asok
>    1/ 1 throttle
>    0/ 0 refs
>    1/ 5 xio
>    1/ 5 compressor
>    1/ 5 newstore
>    1/ 5 bluestore
>    1/ 5 bluefs
>    1/ 3 bdev
>    1/ 5 kstore
>    4/ 5 rocksdb
>    4/ 5 leveldb
>    1/ 5 kinetic
>    1/ 5 fuse
>   -2/-2 (syslog threshold)
>   -1/-1 (stderr threshold)
>   max_recent     10000
>   max_new         1000
>   log_file /var/log/ceph/ceph2-mds.ceph2-mds-2.log
> --- end dump of recent events ---
> 2017-05-17 08:36:03.087895 7fcc97235700 -1 *** Caught signal (Aborted) **
>  in thread 7fcc97235700 thread_name:mds_rank_progr
>
> I would appreciate any hints about how to aproach a recovery attempt.
>
> Thank you,
> Simion Marius Rad
> Sr.SysAdmin
> PropertyShark.com
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com