Re: OSDs crashing after server reboot.

Cassiano Pilipavicius <cpilipav@xxxxxxxxx> · Fri, 12 Mar 2021 13:47:21 -0300

Thanks again Igor, using the ceph-bluestore-tool with the
CEPH_ARGS="--bluestore_hybrid_alloc_mem_cap=0" I was able to detect two
OSDs returning IO errors.

This OSDs crashing has caused backfills operations that triggered some OSDs
marking others as down due to some kind of slow operations and then started
a domino effect that has left the pool in a really bad shape.

I think that incomplete backfills left a lot of PGs with 0 objects and when
this pgs started to backfill/recover, the OSDs crashed.

Now I have removed the failed ODS and have some degraded/incomplete PGs and
unfound objects that I am inserting back to the cluster with
ceph-objectstore-tool.

On Thu, Mar 11, 2021 at 2:34 PM Cassiano Pilipavicius <cpilipav@xxxxxxxxx>
wrote:

> Hi, really this error was only showing up when I've tried to run
> ceph-bluestore-tool repair, In my 3 OSDs that keeps crashing, it show the
> following log... please let me know if there is something I can do to get
> the pool back to a functioning state.
>
> Uptime(secs): 0.0 total, 0.0 interval
> Flush(GB): cumulative 0.000, interval 0.000
> AddFile(GB): cumulative 0.000, interval 0.000
> AddFile(Total Files): cumulative 0, interval 0
> AddFile(L0 Files): cumulative 0, interval 0
> AddFile(Keys): cumulative 0, interval 0
> Cumulative compaction: 0.00 GB write, 21.51 MB/s write, 0.00 GB read, 0.00
> MB/s read, 0.0 seconds
> Interval compaction: 0.00 GB write, 0.00 MB/s write, 0.00 GB read, 0.00
> MB/s read, 0.0 seconds
> Stalls(count): 0 level0_slowdown, 0 level0_slowdown_with_compaction, 0
> level0_numfiles, 0 level0_numfiles_with_compaction, 0 stop for
> pending_compaction_bytes, 0 slowdown for pending_compaction_bytes, 0
> memtable_compaction, 0 memtable_slowdown, interval 0 total count
>
> ** File Read Latency Histogram By Level [default] **
>
>    -32> 2021-03-11 14:25:55.812 7f50161b7700  5 prioritycache tune_memory
> target: 4294967296 mapped: 445022208 unmapped: 59588608 heap: 504610816 old
> mem: 134217728 new mem: 2564495564
>    -31> 2021-03-11 14:25:55.813 7f50161b7700  5 prioritycache tune_memory
> target: 4294967296 mapped: 445210624 unmapped: 59400192 heap: 504610816 old
> mem: 2564495564 new mem: 2816296009
>    -30> 2021-03-11 14:25:55.813 7f50161b7700  5
> bluestore.MempoolThread(0x558aa9a8ea98) _trim_shards cache_size: 2816296009
> kv_alloc: 956301312 kv_used: 6321600 meta_alloc: 956301312 meta_used: 11680
> data_alloc: 620756992 data_used: 151552
>    -29> 2021-03-11 14:25:55.838 7f502962ba80  0 <cls>
> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.16/rpm/el7/BUILD/ceph-14.2.16/src/cls/cephfs/cls_cephfs.cc:197:
> loading cephfs
>    -28> 2021-03-11 14:25:55.839 7f502962ba80  0 <cls>
> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.16/rpm/el7/BUILD/ceph-14.2.16/src/cls/hello/cls_hello.cc:296:
> loading cls_hello
>    -27> 2021-03-11 14:25:55.840 7f502962ba80  0 _get_class not permitted
> to load kvs
>    -26> 2021-03-11 14:25:55.840 7f502962ba80  0 _get_class not permitted
> to load lua
>    -25> 2021-03-11 14:25:55.852 7f502962ba80  0 _get_class not permitted
> to load sdk
>    -24> 2021-03-11 14:25:55.853 7f502962ba80  0 osd.80 697960 crush map
> has features 283675107524608, adjusting msgr requires for clients
>    -23> 2021-03-11 14:25:55.853 7f502962ba80  0 osd.80 697960 crush map
> has features 283675107524608 was 8705, adjusting msgr requires for mons
>    -22> 2021-03-11 14:25:55.853 7f502962ba80  0 osd.80 697960 crush map
> has features 3026702624700514304, adjusting msgr requires for osds
>    -21> 2021-03-11 14:25:56.814 7f50161b7700  5 prioritycache tune_memory
> target: 4294967296 mapped: 501989376 unmapped: 21192704 heap: 523182080 old
> mem: 2816296009 new mem: 2842012351
>    -20> 2021-03-11 14:25:57.816 7f50161b7700  5 prioritycache tune_memory
> target: 4294967296 mapped: 542867456 unmapped: 19210240 heap: 562077696 old
> mem: 2842012351 new mem: 2844985645
>    -19> 2021-03-11 14:25:58.327 7f502962ba80  0 osd.80 697960 load_pgs
>    -18> 2021-03-11 14:25:58.818 7f50161b7700  5 prioritycache tune_memory
> target: 4294967296 mapped: 596754432 unmapped: 229376 heap: 596983808 old
> mem: 2844985645 new mem: 2845356061
>    -17> 2021-03-11 14:25:59.818 7f50161b7700  5 prioritycache tune_memory
> target: 4294967296 mapped: 679067648 unmapped: 753664 heap: 679821312 old
> mem: 2845356061 new mem: 2845406382
>    -16> 2021-03-11 14:26:00.820 7f50161b7700  5 prioritycache tune_memory
> target: 4294967296 mapped: 729161728 unmapped: 991232 heap: 730152960 old
> mem: 2845406382 new mem: 2845414228
>    -15> 2021-03-11 14:26:00.820 7f50161b7700  5
> bluestore.MempoolThread(0x558aa9a8ea98) _trim_shards cache_size: 2845414228
> kv_alloc: 1174405120 kv_used: 166707616 meta_alloc: 1006632960 meta_used:
> 54703 data_alloc: 637534208 data_used: 745472
>    -14> 2021-03-11 14:26:01.735 7f502962ba80  0 osd.80 697960 load_pgs
> opened 63 pgs
>    -13> 2021-03-11 14:26:01.735 7f502962ba80  0 osd.80 697960 using
> weightedpriority op queue with priority op cut off at 64.
>    -12> 2021-03-11 14:26:01.736 7f502962ba80 -1 osd.80 697960
> log_to_monitors {default=true}
>    -11> 2021-03-11 14:26:01.743 7f502962ba80 -1 osd.80 697960
> mon_cmd_maybe_osd_create fail: 'osd.80 has already bound to class 'backup',
> can not reset class to 'hdd'; use 'ceph osd crush rm-device-class <id>' to
> remove old class first': (16) Device or resource busy
>    -10> 2021-03-11 14:26:01.746 7f502962ba80  0 osd.80 697960 done with
> init, starting boot process
>     -9> 2021-03-11 14:26:01.748 7f5013740700  4 mgrc handle_mgr_map Got
> map version 175738
>     -8> 2021-03-11 14:26:01.748 7f5013740700  4 mgrc handle_mgr_map Active
> mgr is now [v2:10.69.57.2:6802/3699587,v1:10.69.57.2:6803/3699587]
>     -7> 2021-03-11 14:26:01.748 7f5013740700  4 mgrc reconnect Starting
> new session with [v2:10.69.57.2:6802/3699587,v1:10.69.57.2:6803/3699587]
>     -6> 2021-03-11 14:26:01.752 7f5013740700  4 mgrc handle_mgr_configure
> stats_period=5
>     -5> 2021-03-11 14:26:01.752 7f5013740700  4 mgrc handle_mgr_configure
> updated stats threshold: 5
>     -4> 2021-03-11 14:26:01.822 7f50161b7700  5 prioritycache tune_memory
> target: 4294967296 mapped: 802258944 unmapped: 245760 heap: 802504704 old
> mem: 2845414228 new mem: 2845415533
>     -3> 2021-03-11 14:26:02.826 7f50161b7700  5 prioritycache tune_memory
> target: 4294967296 mapped: 815693824 unmapped: 6733824 heap: 822427648 old
> mem: 2845415533 new mem: 2845415776
>     -2> 2021-03-11 14:26:03.084 7f5005b9c700  0 log_channel(cluster) log
> [INF] : 0.23 continuing backfill to osd.41 from (31979'7979,561511'9571]
> MIN to 561511'9571
>     -1> 2021-03-11 14:26:03.088 7f5003b98700 -1
> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.16/rpm/el7/BUILD/ceph-14.2.16/src/osd/PGLog.cc:
> In function 'void PGLog::merge_log(pg_info_t&, pg_log_t&, pg_shard_t,
> pg_info_t&, PGLog::LogEntryHandler*, bool&, bool&)' thread 7f5003b98700
> time 2021-03-11 14:26:03.080329
> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.16/rpm/el7/BUILD/ceph-14.2.16/src/osd/PGLog.cc:
> 368: FAILED ceph_assert(log.head >= olog.tail && olog.head >= log.tail)
>
>  ceph version 14.2.16 (762032d6f509d5e7ee7dc008d80fe9c87086603c) nautilus
> (stable)
>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x14a) [0x558a9db62c7d]
>  2: (()+0x4d8e45) [0x558a9db62e45]
>  3: (PGLog::merge_log(pg_info_t&, pg_log_t&, pg_shard_t, pg_info_t&,
> PGLog::LogEntryHandler*, bool&, bool&)+0x1c22) [0x558a9dd76a42]
>  4: (PG::merge_log(ObjectStore::Transaction&, pg_info_t&, pg_log_t&,
> pg_shard_t)+0x64) [0x558a9dcd8234]
>  5: (PG::RecoveryState::Stray::react(MLogRec const&)+0x22b)
> [0x558a9dd1adeb]
>  6: (boost::statechart::simple_state<PG::RecoveryState::Stray,
> PG::RecoveryState::Started, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na,
> mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na,
> mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na,
> mpl_::na, mpl_::na, mpl_::na>,
> (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base
> const&, void const*)+0xa5) [0x558a9dd693f5]
>  7: (PG::do_peering_event(std::shared_ptr<PGPeeringEvent>,
> PG::RecoveryCtx*)+0x2dd) [0x558a9dd2d3ed]
>  8: (OSD::dequeue_peering_evt(OSDShard*, PG*,
> std::shared_ptr<PGPeeringEvent>, ThreadPool::TPHandle&)+0x1b4)
> [0x558a9dc69f34]
>  9: (PGPeeringItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&,
> ThreadPool::TPHandle&)+0x51) [0x558a9ded2291]
>  10: (OSD::ShardedOpWQ::_process(unsigned int,
> ceph::heartbeat_handle_d*)+0x90f) [0x558a9dc5ea4f]
>  11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x5b6)
> [0x558a9e216e56]
>  12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x558a9e219970]
>  13: (()+0x7ea5) [0x7f5026673ea5]
>  14: (clone()+0x6d) [0x7f50255369fd]
>
>      0> 2021-03-11 14:26:03.099 7f5003b98700 -1 *** Caught signal
> (Aborted) **
>  in thread 7f5003b98700 thread_name:tp_osd_tp
>
>  ceph version 14.2.16 (762032d6f509d5e7ee7dc008d80fe9c87086603c) nautilus
> (stable)
>  1: (()+0xf630) [0x7f502667b630]
>  2: (gsignal()+0x37) [0x7f502546e3d7]
>  3: (abort()+0x148) [0x7f502546fac8]
>  4: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x199) [0x558a9db62ccc]
>  5: (()+0x4d8e45) [0x558a9db62e45]
>  6: (PGLog::merge_log(pg_info_t&, pg_log_t&, pg_shard_t, pg_info_t&,
> PGLog::LogEntryHandler*, bool&, bool&)+0x1c22) [0x558a9dd76a42]
>  7: (PG::merge_log(ObjectStore::Transaction&, pg_info_t&, pg_log_t&,
> pg_shard_t)+0x64) [0x558a9dcd8234]
>  8: (PG::RecoveryState::Stray::react(MLogRec const&)+0x22b)
> [0x558a9dd1adeb]
>  9: (boost::statechart::simple_state<PG::RecoveryState::Stray,
> PG::RecoveryState::Started, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na,
> mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na,
> mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na,
> mpl_::na, mpl_::na, mpl_::na>,
> (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base
> const&, void const*)+0xa5) [0x558a9dd693f5]
>  10: (PG::do_peering_event(std::shared_ptr<PGPeeringEvent>,
> PG::RecoveryCtx*)+0x2dd) [0x558a9dd2d3ed]
>  11: (OSD::dequeue_peering_evt(OSDShard*, PG*,
> std::shared_ptr<PGPeeringEvent>, ThreadPool::TPHandle&)+0x1b4)
> [0x558a9dc69f34]
>  12: (PGPeeringItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&,
> ThreadPool::TPHandle&)+0x51) [0x558a9ded2291]
>  13: (OSD::ShardedOpWQ::_process(unsigned int,
> ceph::heartbeat_handle_d*)+0x90f) [0x558a9dc5ea4f]
>  14: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x5b6)
> [0x558a9e216e56]
>  15: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x558a9e219970]
>  16: (()+0x7ea5) [0x7f5026673ea5]
>  17: (clone()+0x6d) [0x7f50255369fd]
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed
> to interpret this.
>
> --- logging levels ---
>    0/ 5 none
>    0/ 0 lockdep
>    0/ 0 context
>    0/ 0 crush
>    0/ 0 mds
>    0/ 0 mds_balancer
>    0/ 0 mds_locker
>    0/ 0 mds_log
>    0/ 0 mds_log_expire
>    0/ 0 mds_migrator
>    0/ 0 buffer
>    0/ 0 timer
>    0/ 0 filer
>    0/ 1 striper
>    0/ 0 objecter
>    0/ 0 rados
>    0/ 0 rbd
>    0/ 5 rbd_mirror
>    0/ 5 rbd_replay
>    0/ 0 journaler
>    0/ 0 objectcacher
>    0/ 0 client
>    0/ 0 osd
>    0/ 0 optracker
>    0/ 0 objclass
>    0/ 0 filestore
>    0/ 0 journal
>    0/ 0 ms
>    0/ 0 mon
>    0/ 0 monc
>    0/ 0 paxos
>    0/ 0 tp
>    0/ 0 auth
>    1/ 5 crypto
>    0/ 0 finisher
>    1/ 1 reserver
>    0/ 0 heartbeatmap
>    0/ 0 perfcounter
>    0/ 0 rgw
>    1/ 5 rgw_sync
>    1/10 civetweb
>    1/ 5 javaclient
>    0/ 0 asok
>    0/ 0 throttle
>    0/ 0 refs
>    1/ 5 xio
>    1/ 5 compressor
>    1/ 5 bluestore
>    1/ 5 bluefs
>    1/ 3 bdev
>    1/ 5 kstore
>    4/ 5 rocksdb
>    4/ 5 leveldb
>    4/ 5 memdb
>    1/ 5 kinetic
>    1/ 5 fuse
>    1/ 5 mgr
>    1/ 5 mgrc
>    1/ 5 dpdk
>    1/ 5 eventtrace
>    1/ 5 prioritycache
>   -2/-2 (syslog threshold)
>   -1/-1 (stderr threshold)
>   max_recent     10000
>   max_new         1000
>   log_file /var/log/ceph/ceph-osd.80.log
> --- end dump of recent events ---
>
> On Thu, Mar 11, 2021 at 1:23 PM Igor Fedotov <ifedotov@xxxxxxx> wrote:
>
>> Hi Cassiano,
>>
>> the backtrace you've provided relates to the bug fixed by:
>> https://github.com/ceph/ceph/pull/37793
>>
>> This fix is going to be releases with the upcoming v14.2.17.
>>
>>
>> But I doubt that your original crashes have the same root cause - this
>> issue appears during shutdown only.
>>
>> Anyway you can work around it by using different allocator: avl or bitmap.
>>
>>
>> Thanks,
>>
>> Igor
>>
>>
>> On 3/11/2021 6:21 PM, Cassiano Pilipavicius wrote:
>> > Hi, please if someone know how to help, I have an HDD pool in mycluster
>> and
>> > after rebooting one server,  my osds has started to crash.
>> >
>> > This pool is a backup pool and have OSD as failure domain with an size
>> of 2.
>> >
>> > After rebooting one server, My osds started to crash, and the thing is
>> only
>> > getting worse. I have then tried to run ceph-bluestore-tool repair and I
>> > receive what I think is the same error that shows on the osd logs:
>> >
>> > [root@cwvh13 ~]# ceph-bluestore-tool repair --path
>> > /var/lib/ceph/osd/ceph-81 --log-level 10
>> >
>> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.16/rpm/el7/BUILD/ceph-14.2.16/src/os/bluestore/Allocator.cc:
>> > In function 'virtual Allocator::SocketHook::~SocketHook()' thread
>> > 7f6467ffcec0 time 2021-03-11 12:13:12.121766
>> >
>> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.16/rpm/el7/BUILD/ceph-14.2.16/src/os/bluestore/Allocator.cc:
>> > 53: FAILED ceph_assert(r == 0)
>> >   ceph version 14.2.16 (762032d6f509d5e7ee7dc008d80fe9c87086603c)
>> nautilus
>> > (stable)
>> >   1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>> > const*)+0x14a) [0x7f645e1a7b27]
>> >   2: (()+0x25ccef) [0x7f645e1a7cef]
>> >   3: (()+0x3cd57f) [0x5642e85c457f]
>> >   4: (HybridAllocator::~HybridAllocator()+0x17) [0x5642e85f3f37]
>> >   5: (BlueStore::_close_alloc()+0x42) [0x5642e84379d2]
>> >   6: (BlueStore::_close_db_and_around(bool)+0x2f8) [0x5642e84bbac8]
>> >   7: (BlueStore::_fsck(BlueStore::FSCKDepth, bool)+0x293)
>> [0x5642e84bbf13]
>> >   8: (main()+0x13cc) [0x5642e83caaec]
>> >   9: (__libc_start_main()+0xf5) [0x7f645ae24555]
>> >   10: (()+0x1fae9f) [0x5642e83f1e9f]
>> > _______________________________________________
>> > ceph-users mailing list -- ceph-users@xxxxxxx
>> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@xxxxxxx
>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>>
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx