Re: OSD Failures after pg_num increase on one of the pools

Григорьев Артём Дмитриевич <Grigorev.Artem4@xxxxxxxxxxxxxx> · Fri, 23 Oct 2020 13:56:00 +0000

It was ok in monitoring and logs, OSD nodes have plenty of available cpu and ram.

Previous pg_num was 256.

________________________________
From: Eugen Block <eblock@xxxxxx>
Sent: Friday, October 23, 2020 2:06:27 PM
To: ceph-users@xxxxxxx
Subject:  Re: OSD Failures after pg_num increase on one of the pools

Hi,

do you see any peaks on the OSD nodes like OOM killer etc.?
Instead of norecover flag I would try the nodown and noout flags to
prevent flapping OSDs. What was the previous pg_num before you
increased to 512?

Regards,
Eugen

Zitat von Артём Григорьев <artemmiet@xxxxxxxxx>:

> Hello everyone,
>
> I created a new ceph 14.2.7 Nautilus cluster  recently. Cluster consists of
> 3 racks and 2 osd nodes on each rack, 12 new hdd in each node. HDD
> model is TOSHIBA
> MG07ACA14TE 14Tb. All data pools are ec pools.
> Yesterday I decided to increase pg number on one of the pools with
> command "ceph
> osd pool set photo.buckets.data pg_num 512", after that many osds started
> to crash with "out" and "down" status. I tried to increase recovery_sleep
> to 1s but osds still crashes. Osds started working properly only when i set
> "norecover" flag, but osd scrub errors appeared after that.
>
> In logs from osd during crashes i found this:
> ---
>
> Oct 21 15:12:11 ceph-osd-201 ceph-osd[58159]:
> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHIN
>
> E_SIZE/huge/release/14.2.7/rpm/el7/BUILD/ceph-14.2.7/src/osd/ECBackend.cc:
> In function 'void ECBackend::continue_recovery_op(ECBackend::RecoveryOp&,
> RecoveryMessages*)'
>
> thread 7f8af535d700 time 2020-10-21 15:12:11.460092
>
> Oct 21 15:12:11 ceph-osd-201 ceph-osd[58159]:
> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHIN
>
> E_SIZE/huge/release/14.2.7/rpm/el7/BUILD/ceph-14.2.7/src/osd/ECBackend.cc:
> 648: FAILED ceph_assert(pop.data.length() ==
> sinfo.aligned_logical_offset_to_chunk_offset( aft
>
> er_progress.data_recovered_to - op.recovery_progress.data_recovered_to))
>
> Oct 21 15:12:11 ceph-osd-201 ceph-osd[58159]: ceph version 14.2.7
> (3d58626ebeec02d8385a4cefb92c6cbc3a45bfe8) nautilus (stable)
>
> Oct 21 15:12:11 ceph-osd-201 ceph-osd[58159]: 1:
> (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x14a) [0x55fc694d6c0f]
>
> Oct 21 15:12:11 ceph-osd-201 ceph-osd[58159]: 2: (()+0x4dddd7)
> [0x55fc694d6dd7]
>
> Oct 21 15:12:11 ceph-osd-201 ceph-osd[58159]: 3:
> (ECBackend::continue_recovery_op(ECBackend::RecoveryOp&,
> RecoveryMessages*)+0x1740) [0x55fc698cafa0]
>
> Oct 21 15:12:11 ceph-osd-201 ceph-osd[58159]: 4:
> (ECBackend::handle_recovery_read_complete(hobject_t const&,
> boost::tuples::tuple<unsigned long, unsigned long, std::map<pg_shard_t,
> ceph::buffer::v14_2_0::list, std::less<pg_shard_t>,
> std::allocator<std::pair<pg_shard_t const, ceph::buffer::v14_2_0::list> >
>> , boost::tuples::null_type, boost::tuples::null_type,
> boost::tuples::null_type, boost::tuples::null_type,
> boost::tuples::null_type, boost::tuples::null_type,
> boost::tuples::null_type>&, boost::optional<std::map<std::string,
> ceph::buffer::v14_2_0::list, std::less<std::string>,
> std::allocator<std::pair<std::string const, ceph::buffer::v14_2_0::list> >
>> >, RecoveryMessages*)+0x734) [0x55fc698cb804]
>
> Oct 21 15:12:11 ceph-osd-201 ceph-osd[58159]: 5:
> (OnRecoveryReadComplete::finish(std::pair<RecoveryMessages*,
> ECBackend::read_result_t&>&)+0x94) [0x55fc698ebbe4]
>
> Oct 21 15:12:11 ceph-osd-201 ceph-osd[58159]: 6:
> (ECBackend::complete_read_op(ECBackend::ReadOp&, RecoveryMessages*)+0x8c)
> [0x55fc698bfdcc]
>
> Oct 21 15:12:11 ceph-osd-201 ceph-osd[58159]: 7:
> (ECBackend::handle_sub_read_reply(pg_shard_t, ECSubReadReply&,
> RecoveryMessages*, ZTracer::Trace const&)+0x109c) [0x55fc698d6b8c]
>
> Oct 21 15:12:11 ceph-osd-201 ceph-osd[58159]: 8:
> (ECBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x17f)
> [0x55fc698d718f]
>
> Oct 21 15:12:11 ceph-osd-201 ceph-osd[58159]: 9:
> (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x4a)
> [0x55fc697c18ea]
>
> Oct 21 15:12:11 ceph-osd-201 ceph-osd[58159]: 10:
> (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
> ThreadPool::TPHandle&)+0x5b3) [0x55fc697676b3]
>
> Oct 21 15:12:11 ceph-osd-201 ceph-osd[58159]: 11:
> (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>,
> ThreadPool::TPHandle&)+0x362) [0x55fc695b3d72]
>
> Oct 21 15:12:11 ceph-osd-201 ceph-osd[58159]: 12: (PGOpItem::run(OSD*,
> OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x62)
> [0x55fc698415c2]
>
> Oct 21 15:12:11 ceph-osd-201 ceph-osd[58159]: 13:
> (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x90f)
> [0x55fc695cebbf]
>
> Oct 21 15:12:11 ceph-osd-201 ceph-osd[58159]: 14:
> (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x5b6)
> [0x55fc69b6f976]
>
> Oct 21 15:12:11 ceph-osd-201 ceph-osd[58159]: 15:
> (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55fc69b72490]
>
> Oct 21 15:12:11 ceph-osd-201 ceph-osd[58159]: 16: (()+0x7e65)
> [0x7f8b1ddede65]
>
> Oct 21 15:12:11 ceph-osd-201 ceph-osd[58159]: 17: (clone()+0x6d)
> [0x7f8b1ccb188d]
>
> Oct 21 15:12:11 ceph-osd-201 ceph-osd[58159]: *** Caught signal (Aborted) **
>
> Oct 21 15:12:11 ceph-osd-201 ceph-osd[58159]: in thread 7f8af535d700
> thread_name:tp_osd_tp
> ---
>
> Current ec profile and pool info bellow:
>
> # ceph osd erasure-code-profile get EC42
>
> crush-device-class=hdd
>
> crush-failure-domain=host
>
> crush-root=main
>
> jerasure-per-chunk-alignment=false
>
> k=4
>
> m=2
>
> plugin=jerasure
>
> technique=reed_sol_van
>
> w=8
>
>
> pool 25 'photo.buckets.data' erasure size 6 min_size 4 crush_rule 6
> object_hash rjenkins pg_num 512 pgp_num 280 pgp_num_target 512
> autoscale_mode warn last_change 43418 lfor 0/0/42223 flags hashpspool
> stripe_width 1048576 application rgw
>
>
> Current ceph status:
>
> ceph -s
>
>   cluster:
>
>     id:     9ec8d309-a620-4ad8-93fa-c2d111e5256e
>
>     health: HEALTH_ERR
>
>             norecover flag(s) set
>
>             1 pools have many more objects per pg than average
>
>             4542629 scrub errors
>
>             Possible data damage: 6 pgs inconsistent
>
>             Degraded data redundancy: 1207268/578535561 objects degraded
> (0.209%), 51 pgs degraded, 35 pgs undersized
>
>             85 pgs not deep-scrubbed in time
>
>
>
>   services:
>
>     mon: 3 daemons, quorum ceph-osd-101,ceph-osd-201,ceph-osd-301 (age 2w)
>
>     mgr: ceph-osd-101(active, since 3w), standbys: ceph-osd-301,
> ceph-osd-201
>
>     osd: 72 osds: 72 up (since 11h), 72 in (since 21h); 48 remapped pgs
>
>          flags norecover
>
>     rgw: 6 daemons active (ceph-osd-101.rgw0, ceph-osd-102.rgw0,
> ceph-osd-201.rgw0, ceph-osd-202.rgw0, ceph-osd-301.rgw0, ceph-osd-302.rgw0)
>
>
>
>   data:
>
>     pools:   26 pools, 15680 pgs
>
>     objects: 96.46M objects, 124 TiB
>
>     usage:   303 TiB used, 613 TiB / 917 TiB avail
>
>     pgs:     1207268/578535561 objects degraded (0.209%)
>
>              14068769/578535561 objects misplaced (2.432%)
>
>              15290 active+clean
>
>              312   active+recovering
>
>              30    active+undersized+degraded+remapped+backfilling
>
>              21    active+recovering+degraded
>
>              13    active+remapped+backfilling
>
>              6     active+clean+inconsistent
>
>              5     active+recovering+undersized+remapped
>
>              3     active+clean+scrubbing+deep
>
>
> So now my cluster is stuck and can't recover properly, can someone give
> info about this problem? Is it a bug?
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx