Re: OSD Failures after pg_num increase on one of the pools

Eugen Block <eblock@xxxxxx> · Fri, 23 Oct 2020 11:06:27 +0000

Hi,

do you see any peaks on the OSD nodes like OOM killer etc.?
Instead of norecover flag I would try the nodown and noout flags to  
prevent flapping OSDs. What was the previous pg_num before you  
increased to 512?

Regards,
Eugen

Zitat von Артём Григорьев <artemmiet@xxxxxxxxx>:

Hello everyone,

I created a new ceph 14.2.7 Nautilus cluster  recently. Cluster consists of
3 racks and 2 osd nodes on each rack, 12 new hdd in each node. HDD
model is TOSHIBA
MG07ACA14TE 14Tb. All data pools are ec pools.
Yesterday I decided to increase pg number on one of the pools with
command "ceph
osd pool set photo.buckets.data pg_num 512", after that many osds started
to crash with "out" and "down" status. I tried to increase recovery_sleep
to 1s but osds still crashes. Osds started working properly only when i set
"norecover" flag, but osd scrub errors appeared after that.

In logs from osd during crashes i found this:
---

Oct 21 15:12:11 ceph-osd-201 ceph-osd[58159]:
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHIN

E_SIZE/huge/release/14.2.7/rpm/el7/BUILD/ceph-14.2.7/src/osd/ECBackend.cc:
In function 'void ECBackend::continue_recovery_op(ECBackend::RecoveryOp&,
RecoveryMessages*)'

thread 7f8af535d700 time 2020-10-21 15:12:11.460092

Oct 21 15:12:11 ceph-osd-201 ceph-osd[58159]:
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHIN

E_SIZE/huge/release/14.2.7/rpm/el7/BUILD/ceph-14.2.7/src/osd/ECBackend.cc:
648: FAILED ceph_assert(pop.data.length() ==
sinfo.aligned_logical_offset_to_chunk_offset( aft

er_progress.data_recovered_to - op.recovery_progress.data_recovered_to))

Oct 21 15:12:11 ceph-osd-201 ceph-osd[58159]: ceph version 14.2.7
(3d58626ebeec02d8385a4cefb92c6cbc3a45bfe8) nautilus (stable)

Oct 21 15:12:11 ceph-osd-201 ceph-osd[58159]: 1:
(ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x14a) [0x55fc694d6c0f]

Oct 21 15:12:11 ceph-osd-201 ceph-osd[58159]: 2: (()+0x4dddd7)
[0x55fc694d6dd7]

Oct 21 15:12:11 ceph-osd-201 ceph-osd[58159]: 3:
(ECBackend::continue_recovery_op(ECBackend::RecoveryOp&,
RecoveryMessages*)+0x1740) [0x55fc698cafa0]

Oct 21 15:12:11 ceph-osd-201 ceph-osd[58159]: 4:
(ECBackend::handle_recovery_read_complete(hobject_t const&,
boost::tuples::tuple<unsigned long, unsigned long, std::map<pg_shard_t,
ceph::buffer::v14_2_0::list, std::less<pg_shard_t>,
std::allocator<std::pair<pg_shard_t const, ceph::buffer::v14_2_0::list> >
, boost::tuples::null_type, boost::tuples::null_type,
boost::tuples::null_type, boost::tuples::null_type,
boost::tuples::null_type, boost::tuples::null_type,
boost::tuples::null_type>&, boost::optional<std::map<std::string,
ceph::buffer::v14_2_0::list, std::less<std::string>,
std::allocator<std::pair<std::string const, ceph::buffer::v14_2_0::list> >
>, RecoveryMessages*)+0x734) [0x55fc698cb804]

Oct 21 15:12:11 ceph-osd-201 ceph-osd[58159]: 5:
(OnRecoveryReadComplete::finish(std::pair<RecoveryMessages*,
ECBackend::read_result_t&>&)+0x94) [0x55fc698ebbe4]

Oct 21 15:12:11 ceph-osd-201 ceph-osd[58159]: 6:
(ECBackend::complete_read_op(ECBackend::ReadOp&, RecoveryMessages*)+0x8c)
[0x55fc698bfdcc]

Oct 21 15:12:11 ceph-osd-201 ceph-osd[58159]: 7:
(ECBackend::handle_sub_read_reply(pg_shard_t, ECSubReadReply&,
RecoveryMessages*, ZTracer::Trace const&)+0x109c) [0x55fc698d6b8c]

Oct 21 15:12:11 ceph-osd-201 ceph-osd[58159]: 8:
(ECBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x17f)
[0x55fc698d718f]

Oct 21 15:12:11 ceph-osd-201 ceph-osd[58159]: 9:
(PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x4a)
[0x55fc697c18ea]

Oct 21 15:12:11 ceph-osd-201 ceph-osd[58159]: 10:
(PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
ThreadPool::TPHandle&)+0x5b3) [0x55fc697676b3]

Oct 21 15:12:11 ceph-osd-201 ceph-osd[58159]: 11:
(OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>,
ThreadPool::TPHandle&)+0x362) [0x55fc695b3d72]

Oct 21 15:12:11 ceph-osd-201 ceph-osd[58159]: 12: (PGOpItem::run(OSD*,
OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x62)
[0x55fc698415c2]

Oct 21 15:12:11 ceph-osd-201 ceph-osd[58159]: 13:
(OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x90f)
[0x55fc695cebbf]

Oct 21 15:12:11 ceph-osd-201 ceph-osd[58159]: 14:
(ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x5b6)
[0x55fc69b6f976]

Oct 21 15:12:11 ceph-osd-201 ceph-osd[58159]: 15:
(ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55fc69b72490]

Oct 21 15:12:11 ceph-osd-201 ceph-osd[58159]: 16: (()+0x7e65)
[0x7f8b1ddede65]

Oct 21 15:12:11 ceph-osd-201 ceph-osd[58159]: 17: (clone()+0x6d)
[0x7f8b1ccb188d]

Oct 21 15:12:11 ceph-osd-201 ceph-osd[58159]: *** Caught signal (Aborted) **

Oct 21 15:12:11 ceph-osd-201 ceph-osd[58159]: in thread 7f8af535d700
thread_name:tp_osd_tp
---

Current ec profile and pool info bellow:

# ceph osd erasure-code-profile get EC42

crush-device-class=hdd

crush-failure-domain=host

crush-root=main

jerasure-per-chunk-alignment=false

k=4

m=2

plugin=jerasure

technique=reed_sol_van

w=8

pool 25 'photo.buckets.data' erasure size 6 min_size 4 crush_rule 6
object_hash rjenkins pg_num 512 pgp_num 280 pgp_num_target 512
autoscale_mode warn last_change 43418 lfor 0/0/42223 flags hashpspool
stripe_width 1048576 application rgw

Current ceph status:

ceph -s

  cluster:

    id:     9ec8d309-a620-4ad8-93fa-c2d111e5256e

    health: HEALTH_ERR

            norecover flag(s) set

            1 pools have many more objects per pg than average

            4542629 scrub errors

            Possible data damage: 6 pgs inconsistent

            Degraded data redundancy: 1207268/578535561 objects degraded
(0.209%), 51 pgs degraded, 35 pgs undersized

            85 pgs not deep-scrubbed in time

  services:

    mon: 3 daemons, quorum ceph-osd-101,ceph-osd-201,ceph-osd-301 (age 2w)

    mgr: ceph-osd-101(active, since 3w), standbys: ceph-osd-301,
ceph-osd-201

    osd: 72 osds: 72 up (since 11h), 72 in (since 21h); 48 remapped pgs

         flags norecover

    rgw: 6 daemons active (ceph-osd-101.rgw0, ceph-osd-102.rgw0,
ceph-osd-201.rgw0, ceph-osd-202.rgw0, ceph-osd-301.rgw0, ceph-osd-302.rgw0)

  data:

    pools:   26 pools, 15680 pgs

    objects: 96.46M objects, 124 TiB

    usage:   303 TiB used, 613 TiB / 917 TiB avail

    pgs:     1207268/578535561 objects degraded (0.209%)

             14068769/578535561 objects misplaced (2.432%)

             15290 active+clean

             312   active+recovering

             30    active+undersized+degraded+remapped+backfilling

             21    active+recovering+degraded

             13    active+remapped+backfilling

             6     active+clean+inconsistent

             5     active+recovering+undersized+remapped

             3     active+clean+scrubbing+deep

So now my cluster is stuck and can't recover properly, can someone give
info about this problem? Is it a bug?
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx