Re: PG Stuck EC Pool

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



From this extract from pg query:

 

"up": [

                    11,

                    10,

                    84,

                    83,

                    22,

                    26,

                    69,

                    72,

                    53,

                    59,

                    8,

                    4,

                    46

                ],

                "acting": [

                    2147483647,

                    2147483647,

                    84,

                    83,

                    22,

                    26,

                    69,

                    72,

                    53,

                    59,

                    8,

                    4,

                    46

 

I am wondering if there is an issue on 11 , 10 causing the current active primary “acting_primar": 84” to crash.

 

But can’t see anything that could be causing it.

 

,Ashley

 

From: Ashley Merrick
Sent: 01 June 2017 23:39
To: ceph-users@xxxxxxxx
Subject: RE: PG Stuck EC Pool

 

Have attached the full pg query for the effected PG encase this shows anything of interest.

 

Thanks

 

From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of Ashley Merrick
Sent: 01 June 2017 17:19
To: ceph-users@xxxxxxxx
Subject: PG Stuck EC Pool

 

This sender failed our fraud detection checks and may not be who they appear to be. Learn about spoofing

Feedback

Have a PG which is stuck in this state (Is an EC with K=10 M=3)

 

 

pg 6.14 is active+undersized+degraded+remapped+inconsistent+backfilling, acting [2147483647,2147483647,84,83,22,26,69,72,53,59,8,4,46]

 

Currently have no-recover set, if I unset no recover both OSD 83 + 84 start to flap and go up and down, I see the following in the log's of the OSD.

 

*****

    -5> 2017-06-01 10:08:29.658593 7f430ec97700  1 -- 172.16.3.14:6806/5204 <== osd.17 172.16.3.3:6806/2006016 57 ==== MOSDECSubOpWriteReply(6.31as0 71513 ECSubWriteReply(tid=152, last_complete=0'0, committed=0, applied=1)) v1 ==== 67+0+0 (245959818 0 0) 0x563c9db7be00 con 0x563c9cfca480

    -4> 2017-06-01 10:08:29.658620 7f430ec97700  5 -- op tracker -- seq: 2367, time: 2017-06-01 10:08:29.658620, event: queued_for_pg, op: MOSDECSubOpWriteReply(6.31as0 71513 ECSubWriteReply(tid=152, last_complete=0'0, committed=0, applied=1))

    -3> 2017-06-01 10:08:29.658649 7f4319e11700  5 -- op tracker -- seq: 2367, time: 2017-06-01 10:08:29.658649, event: reached_pg, op: MOSDECSubOpWriteReply(6.31as0 71513 ECSubWriteReply(tid=152, last_complete=0'0, committed=0, applied=1))

    -2> 2017-06-01 10:08:29.658661 7f4319e11700  5 -- op tracker -- seq: 2367, time: 2017-06-01 10:08:29.658660, event: done, op: MOSDECSubOpWriteReply(6.31as0 71513 ECSubWriteReply(tid=152, last_complete=0'0, committed=0, applied=1))

    -1> 2017-06-01 10:08:29.663107 7f43320ec700  5 -- op tracker -- seq: 2317, time: 2017-06-01 10:08:29.663107, event: sub_op_applied, op: osd_op(osd.79.66617:8675008 6.82058b1a rbd_data.e5208a238e1f29.0000000000025f3e [copy-from ver 4678410] snapc 0=[] ondisk+write+ignore_overlay+enforce_snapc+known_if_redirected e71513)

     0> 2017-06-01 10:08:29.663474 7f4319610700 -1 *** Caught signal (Aborted) **

 in thread 7f4319610700 thread_name:tp_osd_recov

 

 ceph version 10.2.7 (50e863e0f4bc8f4b9e31156de690d765af245185)

 1: (()+0x9564a7) [0x563c6a6f24a7]

 2: (()+0xf890) [0x7f4342308890]

 3: (gsignal()+0x37) [0x7f434034f067]

 4: (abort()+0x148) [0x7f4340350448]

 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x256) [0x563c6a7f83d6]

 6: (ReplicatedPG::recover_replicas(int, ThreadPool::TPHandle&)+0x62f) [0x563c6a2850ff]

 7: (ReplicatedPG::start_recovery_ops(int, ThreadPool::TPHandle&, int*)+0xa8a) [0x563c6a2b878a]

 8: (OSD::do_recovery(PG*, ThreadPool::TPHandle&)+0x36d) [0x563c6a131bbd]

 9: (ThreadPool::WorkQueue<PG>::_void_process(void*, ThreadPool::TPHandle&)+0x1d) [0x563c6a17c88d]

 10: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa9f) [0x563c6a7e8e3f]

 11: (ThreadPool::WorkThread::entry()+0x10) [0x563c6a7e9d70]

 12: (()+0x8064) [0x7f4342301064]

 13: (clone()+0x6d) [0x7f434040262d]

 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

*****

 

 

What should my next steps be?

 

Thanks!

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux