It was ok in monitoring and logs, OSD nodes have plenty of available cpu and ram. Previous pg_num was 256. ________________________________ From: Eugen Block <eblock@xxxxxx> Sent: Friday, October 23, 2020 2:06:27 PM To: ceph-users@xxxxxxx Subject: Re: OSD Failures after pg_num increase on one of the pools Hi, do you see any peaks on the OSD nodes like OOM killer etc.? Instead of norecover flag I would try the nodown and noout flags to prevent flapping OSDs. What was the previous pg_num before you increased to 512? Regards, Eugen Zitat von Артём Григорьев <artemmiet@xxxxxxxxx>: > Hello everyone, > > I created a new ceph 14.2.7 Nautilus cluster recently. Cluster consists of > 3 racks and 2 osd nodes on each rack, 12 new hdd in each node. HDD > model is TOSHIBA > MG07ACA14TE 14Tb. All data pools are ec pools. > Yesterday I decided to increase pg number on one of the pools with > command "ceph > osd pool set photo.buckets.data pg_num 512", after that many osds started > to crash with "out" and "down" status. I tried to increase recovery_sleep > to 1s but osds still crashes. Osds started working properly only when i set > "norecover" flag, but osd scrub errors appeared after that. > > In logs from osd during crashes i found this: > --- > > Oct 21 15:12:11 ceph-osd-201 ceph-osd[58159]: > /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHIN > > E_SIZE/huge/release/14.2.7/rpm/el7/BUILD/ceph-14.2.7/src/osd/ECBackend.cc: > In function 'void ECBackend::continue_recovery_op(ECBackend::RecoveryOp&, > RecoveryMessages*)' > > thread 7f8af535d700 time 2020-10-21 15:12:11.460092 > > Oct 21 15:12:11 ceph-osd-201 ceph-osd[58159]: > /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHIN > > E_SIZE/huge/release/14.2.7/rpm/el7/BUILD/ceph-14.2.7/src/osd/ECBackend.cc: > 648: FAILED ceph_assert(pop.data.length() == > sinfo.aligned_logical_offset_to_chunk_offset( aft > > er_progress.data_recovered_to - op.recovery_progress.data_recovered_to)) > > Oct 21 15:12:11 ceph-osd-201 ceph-osd[58159]: ceph version 14.2.7 > (3d58626ebeec02d8385a4cefb92c6cbc3a45bfe8) nautilus (stable) > > Oct 21 15:12:11 ceph-osd-201 ceph-osd[58159]: 1: > (ceph::__ceph_assert_fail(char const*, char const*, int, char > const*)+0x14a) [0x55fc694d6c0f] > > Oct 21 15:12:11 ceph-osd-201 ceph-osd[58159]: 2: (()+0x4dddd7) > [0x55fc694d6dd7] > > Oct 21 15:12:11 ceph-osd-201 ceph-osd[58159]: 3: > (ECBackend::continue_recovery_op(ECBackend::RecoveryOp&, > RecoveryMessages*)+0x1740) [0x55fc698cafa0] > > Oct 21 15:12:11 ceph-osd-201 ceph-osd[58159]: 4: > (ECBackend::handle_recovery_read_complete(hobject_t const&, > boost::tuples::tuple<unsigned long, unsigned long, std::map<pg_shard_t, > ceph::buffer::v14_2_0::list, std::less<pg_shard_t>, > std::allocator<std::pair<pg_shard_t const, ceph::buffer::v14_2_0::list> > >> , boost::tuples::null_type, boost::tuples::null_type, > boost::tuples::null_type, boost::tuples::null_type, > boost::tuples::null_type, boost::tuples::null_type, > boost::tuples::null_type>&, boost::optional<std::map<std::string, > ceph::buffer::v14_2_0::list, std::less<std::string>, > std::allocator<std::pair<std::string const, ceph::buffer::v14_2_0::list> > >> >, RecoveryMessages*)+0x734) [0x55fc698cb804] > > Oct 21 15:12:11 ceph-osd-201 ceph-osd[58159]: 5: > (OnRecoveryReadComplete::finish(std::pair<RecoveryMessages*, > ECBackend::read_result_t&>&)+0x94) [0x55fc698ebbe4] > > Oct 21 15:12:11 ceph-osd-201 ceph-osd[58159]: 6: > (ECBackend::complete_read_op(ECBackend::ReadOp&, RecoveryMessages*)+0x8c) > [0x55fc698bfdcc] > > Oct 21 15:12:11 ceph-osd-201 ceph-osd[58159]: 7: > (ECBackend::handle_sub_read_reply(pg_shard_t, ECSubReadReply&, > RecoveryMessages*, ZTracer::Trace const&)+0x109c) [0x55fc698d6b8c] > > Oct 21 15:12:11 ceph-osd-201 ceph-osd[58159]: 8: > (ECBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x17f) > [0x55fc698d718f] > > Oct 21 15:12:11 ceph-osd-201 ceph-osd[58159]: 9: > (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x4a) > [0x55fc697c18ea] > > Oct 21 15:12:11 ceph-osd-201 ceph-osd[58159]: 10: > (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, > ThreadPool::TPHandle&)+0x5b3) [0x55fc697676b3] > > Oct 21 15:12:11 ceph-osd-201 ceph-osd[58159]: 11: > (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, > ThreadPool::TPHandle&)+0x362) [0x55fc695b3d72] > > Oct 21 15:12:11 ceph-osd-201 ceph-osd[58159]: 12: (PGOpItem::run(OSD*, > OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x62) > [0x55fc698415c2] > > Oct 21 15:12:11 ceph-osd-201 ceph-osd[58159]: 13: > (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x90f) > [0x55fc695cebbf] > > Oct 21 15:12:11 ceph-osd-201 ceph-osd[58159]: 14: > (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x5b6) > [0x55fc69b6f976] > > Oct 21 15:12:11 ceph-osd-201 ceph-osd[58159]: 15: > (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55fc69b72490] > > Oct 21 15:12:11 ceph-osd-201 ceph-osd[58159]: 16: (()+0x7e65) > [0x7f8b1ddede65] > > Oct 21 15:12:11 ceph-osd-201 ceph-osd[58159]: 17: (clone()+0x6d) > [0x7f8b1ccb188d] > > Oct 21 15:12:11 ceph-osd-201 ceph-osd[58159]: *** Caught signal (Aborted) ** > > Oct 21 15:12:11 ceph-osd-201 ceph-osd[58159]: in thread 7f8af535d700 > thread_name:tp_osd_tp > --- > > Current ec profile and pool info bellow: > > # ceph osd erasure-code-profile get EC42 > > crush-device-class=hdd > > crush-failure-domain=host > > crush-root=main > > jerasure-per-chunk-alignment=false > > k=4 > > m=2 > > plugin=jerasure > > technique=reed_sol_van > > w=8 > > > pool 25 'photo.buckets.data' erasure size 6 min_size 4 crush_rule 6 > object_hash rjenkins pg_num 512 pgp_num 280 pgp_num_target 512 > autoscale_mode warn last_change 43418 lfor 0/0/42223 flags hashpspool > stripe_width 1048576 application rgw > > > Current ceph status: > > ceph -s > > cluster: > > id: 9ec8d309-a620-4ad8-93fa-c2d111e5256e > > health: HEALTH_ERR > > norecover flag(s) set > > 1 pools have many more objects per pg than average > > 4542629 scrub errors > > Possible data damage: 6 pgs inconsistent > > Degraded data redundancy: 1207268/578535561 objects degraded > (0.209%), 51 pgs degraded, 35 pgs undersized > > 85 pgs not deep-scrubbed in time > > > > services: > > mon: 3 daemons, quorum ceph-osd-101,ceph-osd-201,ceph-osd-301 (age 2w) > > mgr: ceph-osd-101(active, since 3w), standbys: ceph-osd-301, > ceph-osd-201 > > osd: 72 osds: 72 up (since 11h), 72 in (since 21h); 48 remapped pgs > > flags norecover > > rgw: 6 daemons active (ceph-osd-101.rgw0, ceph-osd-102.rgw0, > ceph-osd-201.rgw0, ceph-osd-202.rgw0, ceph-osd-301.rgw0, ceph-osd-302.rgw0) > > > > data: > > pools: 26 pools, 15680 pgs > > objects: 96.46M objects, 124 TiB > > usage: 303 TiB used, 613 TiB / 917 TiB avail > > pgs: 1207268/578535561 objects degraded (0.209%) > > 14068769/578535561 objects misplaced (2.432%) > > 15290 active+clean > > 312 active+recovering > > 30 active+undersized+degraded+remapped+backfilling > > 21 active+recovering+degraded > > 13 active+remapped+backfilling > > 6 active+clean+inconsistent > > 5 active+recovering+undersized+remapped > > 3 active+clean+scrubbing+deep > > > So now my cluster is stuck and can't recover properly, can someone give > info about this problem? Is it a bug? > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx