Re: PG Stuck EC Pool

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Attaching with logging to level 20.

 

After repeat attempts by removing nobackfill I have got it down to:

 

 

            recovery 31892/272325586 objects degraded (0.012%)

            recovery 2/272325586 objects misplaced (0.000%)

 

However any further attempts after removing nobackfill just causes an instant crash on 83 & 84, at this point I feel there is some corruption on the remaining 11 OSD’s of the PG however the error’s aren’t directly saying that, however always end the crash with:

 

-1 *** Caught signal (Aborted) ** in thread 7f716e862700 thread_name:tp_osd_recov

 

,Ashley

 

From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of Ashley Merrick
Sent: 03 June 2017 17:14
To: ceph-users@xxxxxxxx
Subject: Re: PG Stuck EC Pool

 

This sender failed our fraud detection checks and may not be who they appear to be. Learn about spoofing

Feedback

I have now done some further testing and seeing these errors on 84 / 83 the OSD’s that crash while backfilling to 10,11

 

   -60> 2017-06-03 10:08:56.651768 7f6f76714700  1 -- 172.16.3.14:6823/2694 <== osd.3 172.16.2.101:0/25361 10 ==== osd_ping(ping e71688 stamp 2017-06-03 10:08:56.652035) v2 ==== 47+0+0 (1097709006 0 0) 0x5569ea88d400 con 0x5569e900e300

   -59> 2017-06-03 10:08:56.651804 7f6f76714700  1 -- 172.16.3.14:6823/2694 --> 172.16.2.101:0/25361 -- osd_ping(ping_reply e71688 stamp 2017-06-03 10:08:56.652035) v2 -- ?+0 0x5569e985fc00 con 0x5569e900e300

    -6> 2017-06-03 10:08:56.937156 7f6f5ee4d700  1 -- 172.16.3.14:6822/2694 <== osd.53 172.16.3.7:6816/15230 13 ==== MOSDECSubOpReadReply(6.14s3 71688 ECSubReadReply(tid=83, attrs_read=0)) v1 ==== 148+0+0 (2355392791 0 0) 0x5569e8b22080 con 0x5569e9538f00

    -5> 2017-06-03 10:08:56.937193 7f6f5ee4d700  5 -- op tracker -- seq: 2409, time: 2017-06-03 10:08:56.937193, event: queued_for_pg, op: MOSDECSubOpReadReply(6.14s3 71688 ECSubReadReply(tid=83, attrs_read=0))

    -4> 2017-06-03 10:08:56.937241 7f6f8ef8a700  5 -- op tracker -- seq: 2409, time: 2017-06-03 10:08:56.937240, event: reached_pg, op: MOSDECSubOpReadReply(6.14s3 71688 ECSubReadReply(tid=83, attrs_read=0))

    -3> 2017-06-03 10:08:56.937266 7f6f8ef8a700  0 osd.83 pg_epoch: 71688 pg[6.14s3( v 71685'35512 (68694'30812,71685'35512] local-les=71688 n=15928 ec=31534 les/c/f 71688/69510/67943 71687/71687/71687) [11,10,2147483647,83,22,26,69,72,53,59,8,4,46]/[2147483647,2147483647,2147483647,83,22,26,69,72,53,59,8,4,46] r=3 lpr=71687 pi=47065-71686/711 rops=1 bft=10(1),11(0) crt=71629'35509 mlcod 0'0 active+undersized+degraded+remapped+inconsistent+backfilling NIBBLEWISE] failed_push 6:28170432:::rbd_data.e3d8852ae8944a.0000000000047d28:head from shard 53(8), reps on  unfound? 0

    -2> 2017-06-03 10:08:56.937346 7f6f8ef8a700  5 -- op tracker -- seq: 2409, time: 2017-06-03 10:08:56.937345, event: done, op: MOSDECSubOpReadReply(6.14s3 71688 ECSubReadReply(tid=83, attrs_read=0))

    -1> 2017-06-03 10:08:56.937351 7f6f89f80700 -1 osd.83 pg_epoch: 71688 pg[6.14s3( v 71685'35512 (68694'30812,71685'35512] local-les=71688 n=15928 ec=31534 les/c/f 71688/69510/67943 71687/71687/71687) [11,10,2147483647,83,22,26,69,72,53,59,8,4,46]/[2147483647,2147483647,2147483647,83,22,26,69,72,53,59,8,4,46] r=3 lpr=71687 pi=47065-71686/711 bft=10(1),11(0) crt=71629'35509 mlcod 0'0 active+undersized+degraded+remapped+inconsistent+backfilling NIBBLEWISE] recover_replicas: object added to missing set for backfill, but is not in recovering, error!

   -42> 2017-06-03 10:08:56.968433 7f6f5f04f700  1 -- 172.16.2.114:6822/2694 <== client.22857445 172.16.2.212:0/2238053329 56 ==== osd_op(client.22857445.1:759236283 2.e732321d rbd_data.61b4c6238e1f29.000000000001ea27 [set-alloc-hint object_size 4194304 write_size 4194304,write 126976~45056] snapc 0=[] ondisk+write e71688) v4 ==== 217+0+45056 (2626314663 0 3883338397) 0x5569ea886b00 con 0x5569ea99c880

 

From: Ashley Merrick
Sent: 03 June 2017 14:27
To: 'ceph-users@xxxxxxxx' <ceph-users@xxxxxxxx>
Subject: RE: PG Stuck EC Pool

 

From this extract from pg query:

 

"up": [

                    11,

                    10,

                    84,

                    83,

                    22,

                    26,

                    69,

                    72,

                    53,

                    59,

                    8,

                    4,

                    46

                ],

                "acting": [

                    2147483647,

                    2147483647,

                    84,

                    83,

                    22,

                    26,

                    69,

                    72,

                    53,

                    59,

                    8,

                    4,

                    46

 

I am wondering if there is an issue on 11 , 10 causing the current active primary “acting_primar": 84” to crash.

 

But can’t see anything that could be causing it.

 

,Ashley

 

From: Ashley Merrick
Sent: 01 June 2017 23:39
To: ceph-users@xxxxxxxx
Subject: RE: PG Stuck EC Pool

 

Have attached the full pg query for the effected PG encase this shows anything of interest.

 

Thanks

 

From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of Ashley Merrick
Sent: 01 June 2017 17:19
To: ceph-users@xxxxxxxx
Subject: PG Stuck EC Pool

 

This sender failed our fraud detection checks and may not be who they appear to be. Learn about spoofing

Feedback

Have a PG which is stuck in this state (Is an EC with K=10 M=3)

 

 

pg 6.14 is active+undersized+degraded+remapped+inconsistent+backfilling, acting [2147483647,2147483647,84,83,22,26,69,72,53,59,8,4,46]

 

Currently have no-recover set, if I unset no recover both OSD 83 + 84 start to flap and go up and down, I see the following in the log's of the OSD.

 

*****

    -5> 2017-06-01 10:08:29.658593 7f430ec97700  1 -- 172.16.3.14:6806/5204 <== osd.17 172.16.3.3:6806/2006016 57 ==== MOSDECSubOpWriteReply(6.31as0 71513 ECSubWriteReply(tid=152, last_complete=0'0, committed=0, applied=1)) v1 ==== 67+0+0 (245959818 0 0) 0x563c9db7be00 con 0x563c9cfca480

    -4> 2017-06-01 10:08:29.658620 7f430ec97700  5 -- op tracker -- seq: 2367, time: 2017-06-01 10:08:29.658620, event: queued_for_pg, op: MOSDECSubOpWriteReply(6.31as0 71513 ECSubWriteReply(tid=152, last_complete=0'0, committed=0, applied=1))

    -3> 2017-06-01 10:08:29.658649 7f4319e11700  5 -- op tracker -- seq: 2367, time: 2017-06-01 10:08:29.658649, event: reached_pg, op: MOSDECSubOpWriteReply(6.31as0 71513 ECSubWriteReply(tid=152, last_complete=0'0, committed=0, applied=1))

    -2> 2017-06-01 10:08:29.658661 7f4319e11700  5 -- op tracker -- seq: 2367, time: 2017-06-01 10:08:29.658660, event: done, op: MOSDECSubOpWriteReply(6.31as0 71513 ECSubWriteReply(tid=152, last_complete=0'0, committed=0, applied=1))

    -1> 2017-06-01 10:08:29.663107 7f43320ec700  5 -- op tracker -- seq: 2317, time: 2017-06-01 10:08:29.663107, event: sub_op_applied, op: osd_op(osd.79.66617:8675008 6.82058b1a rbd_data.e5208a238e1f29.0000000000025f3e [copy-from ver 4678410] snapc 0=[] ondisk+write+ignore_overlay+enforce_snapc+known_if_redirected e71513)

     0> 2017-06-01 10:08:29.663474 7f4319610700 -1 *** Caught signal (Aborted) **

 in thread 7f4319610700 thread_name:tp_osd_recov

 

 ceph version 10.2.7 (50e863e0f4bc8f4b9e31156de690d765af245185)

 1: (()+0x9564a7) [0x563c6a6f24a7]

 2: (()+0xf890) [0x7f4342308890]

 3: (gsignal()+0x37) [0x7f434034f067]

 4: (abort()+0x148) [0x7f4340350448]

 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x256) [0x563c6a7f83d6]

 6: (ReplicatedPG::recover_replicas(int, ThreadPool::TPHandle&)+0x62f) [0x563c6a2850ff]

 7: (ReplicatedPG::start_recovery_ops(int, ThreadPool::TPHandle&, int*)+0xa8a) [0x563c6a2b878a]

 8: (OSD::do_recovery(PG*, ThreadPool::TPHandle&)+0x36d) [0x563c6a131bbd]

 9: (ThreadPool::WorkQueue<PG>::_void_process(void*, ThreadPool::TPHandle&)+0x1d) [0x563c6a17c88d]

 10: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa9f) [0x563c6a7e8e3f]

 11: (ThreadPool::WorkThread::entry()+0x10) [0x563c6a7e9d70]

 12: (()+0x8064) [0x7f4342301064]

 13: (clone()+0x6d) [0x7f434040262d]

 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

*****

 

 

What should my next steps be?

 

Thanks!

   -86> 2017-06-03 10:42:27.680419 7f7172069700  5 -- op tracker -- seq: 14015, time: 2017-06-03 10:42:27.680418, event: started, op: osd_op(client.22857445.1:759980367 2.1884fbf0 rbd_data.61b4c6238e1f29.0000000000001c8b [set-alloc-hint object_size 4194304 write_size 4194304,write 3616768~65536] snapc 0=[] ondisk+write e71740)
   -85> 2017-06-03 10:42:27.680423 7f7172069700 10 osd.84 pg_epoch: 71740 pg[2.3f0( v 71739'1378126 (71527'1375122,71739'1378126] local-les=71729 n=6271 ec=130 les/c/f 71729/71729/67943 71727/71728/71728) [84,11] r=0 lpr=71728 crt=71739'1378124 lcod 71739'1378125 mlcod 71739'1378125 active+clean NIBBLEWISE] execute_ctx 0x56146b613c00
   -84> 2017-06-03 10:42:27.680441 7f7172069700 10 osd.84 pg_epoch: 71740 pg[2.3f0( v 71739'1378126 (71527'1375122,71739'1378126] local-les=71729 n=6271 ec=130 les/c/f 71729/71729/67943 71727/71728/71728) [84,11] r=0 lpr=71728 crt=71739'1378124 lcod 71739'1378125 mlcod 71739'1378125 active+clean NIBBLEWISE] do_op 2:0fdf2118:::rbd_data.61b4c6238e1f29.0000000000001c8b:head [set-alloc-hint object_size 4194304 write_size 4194304,write 3616768~65536] ov 71739'1378125 av 71740'1378127 snapc 0=[] snapset 0=[]:[]+head
   -83> 2017-06-03 10:42:27.680450 7f7172069700 10 osd.84 pg_epoch: 71740 pg[2.3f0( v 71739'1378126 (71527'1375122,71739'1378126] local-les=71729 n=6271 ec=130 les/c/f 71729/71729/67943 71727/71728/71728) [84,11] r=0 lpr=71728 crt=71739'1378124 lcod 71739'1378125 mlcod 71739'1378125 active+clean NIBBLEWISE] do_osd_op 2:0fdf2118:::rbd_data.61b4c6238e1f29.0000000000001c8b:head [set-alloc-hint object_size 4194304 write_size 4194304,write 3616768~65536]
   -82> 2017-06-03 10:42:27.680457 7f7172069700 10 osd.84 pg_epoch: 71740 pg[2.3f0( v 71739'1378126 (71527'1375122,71739'1378126] local-les=71729 n=6271 ec=130 les/c/f 71729/71729/67943 71727/71728/71728) [84,11] r=0 lpr=71728 crt=71739'1378124 lcod 71739'1378125 mlcod 71739'1378125 active+clean NIBBLEWISE] do_osd_op  set-alloc-hint object_size 4194304 write_size 4194304
   -81> 2017-06-03 10:42:27.680466 7f7172069700 10 osd.84 pg_epoch: 71740 pg[2.3f0( v 71739'1378126 (71527'1375122,71739'1378126] local-les=71729 n=6271 ec=130 les/c/f 71729/71729/67943 71727/71728/71728) [84,11] r=0 lpr=71728 crt=71739'1378124 lcod 71739'1378125 mlcod 71739'1378125 active+clean NIBBLEWISE] do_osd_op  write 3616768~65536
   -80> 2017-06-03 10:42:27.680474 7f7172069700 20 osd.84 pg_epoch: 71740 pg[2.3f0( v 71739'1378126 (71527'1375122,71739'1378126] local-les=71729 n=6271 ec=130 les/c/f 71729/71729/67943 71727/71728/71728) [84,11] r=0 lpr=71728 crt=71739'1378124 lcod 71739'1378125 mlcod 71739'1378125 active+clean NIBBLEWISE] make_writeable 2:0fdf2118:::rbd_data.61b4c6238e1f29.0000000000001c8b:head snapset=0x56146baf81f8  snapc=0=[]
   -79> 2017-06-03 10:42:27.680481 7f7172069700 20 osd.84 pg_epoch: 71740 pg[2.3f0( v 71739'1378126 (71527'1375122,71739'1378126] local-les=71729 n=6271 ec=130 les/c/f 71729/71729/67943 71727/71728/71728) [84,11] r=0 lpr=71728 crt=71739'1378124 lcod 71739'1378125 mlcod 71739'1378125 active+clean NIBBLEWISE] make_writeable 2:0fdf2118:::rbd_data.61b4c6238e1f29.0000000000001c8b:head done, snapset=0=[]:[]+head
   -78> 2017-06-03 10:42:27.680488 7f7172069700 20 osd.84 pg_epoch: 71740 pg[2.3f0( v 71739'1378126 (71527'1375122,71739'1378126] local-les=71729 n=6271 ec=130 les/c/f 71729/71729/67943 71727/71728/71728) [84,11] r=0 lpr=71728 crt=71739'1378124 lcod 71739'1378125 mlcod 71739'1378125 active+clean NIBBLEWISE] finish_ctx 2:0fdf2118:::rbd_data.61b4c6238e1f29.0000000000001c8b:head 0x56146b613c00 op modify
   -77> 2017-06-03 10:42:27.680500 7f7172069700 10 osd.84 pg_epoch: 71740 pg[2.3f0( v 71739'1378126 (71527'1375122,71739'1378126] local-les=71729 n=6271 ec=130 les/c/f 71729/71729/67943 71727/71728/71728) [84,11] r=0 lpr=71728 crt=71739'1378124 lcod 71739'1378125 mlcod 71739'1378125 active+clean NIBBLEWISE]  set mtime to 2017-06-03 10:42:27.670488
   -76> 2017-06-03 10:42:27.680511 7f7172069700 10 osd.84 pg_epoch: 71740 pg[2.3f0( v 71739'1378126 (71527'1375122,71739'1378126] local-les=71729 n=6271 ec=130 les/c/f 71729/71729/67943 71727/71728/71728) [84,11] r=0 lpr=71728 crt=71739'1378124 lcod 71739'1378125 mlcod 71739'1378125 active+clean NIBBLEWISE]  final snapset 0=[]:[]+head in 2:0fdf2118:::rbd_data.61b4c6238e1f29.0000000000001c8b:head
   -75> 2017-06-03 10:42:27.680524 7f7172069700 20 osd.84 pg_epoch: 71740 pg[2.3f0( v 71739'1378126 (71527'1375122,71739'1378126] local-les=71729 n=6271 ec=130 les/c/f 71729/71729/67943 71727/71728/71728) [84,11] r=0 lpr=71728 crt=71739'1378124 lcod 71739'1378125 mlcod 71739'1378125 active+clean NIBBLEWISE]  zeroing write result code 0
   -74> 2017-06-03 10:42:27.680531 7f7172069700 10 osd.84 pg_epoch: 71740 pg[2.3f0( v 71739'1378126 (71527'1375122,71739'1378126] local-les=71729 n=6271 ec=130 les/c/f 71729/71729/67943 71727/71728/71728) [84,11] r=0 lpr=71728 crt=71739'1378124 lcod 71739'1378125 mlcod 71739'1378125 active+clean NIBBLEWISE] new_repop rep_tid 1391 on osd_op(client.22857445.1:759980367 2.1884fbf0 rbd_data.61b4c6238e1f29.0000000000001c8b [set-alloc-hint object_size 4194304 write_size 4194304,write 3616768~65536] snapc 0=[] ondisk+write e71740) v4
   -73> 2017-06-03 10:42:27.680538 7f7172069700  7 osd.84 pg_epoch: 71740 pg[2.3f0( v 71739'1378126 (71527'1375122,71739'1378126] local-les=71729 n=6271 ec=130 les/c/f 71729/71729/67943 71727/71728/71728) [84,11] r=0 lpr=71728 crt=71739'1378124 lcod 71739'1378125 mlcod 71739'1378125 active+clean NIBBLEWISE] issue_repop rep_tid 1391 o 2:0fdf2118:::rbd_data.61b4c6238e1f29.0000000000001c8b:head
   -72> 2017-06-03 10:42:27.680580 7f7172069700  5 -- op tracker -- seq: 14015, time: 2017-06-03 10:42:27.680580, event: waiting for subops from 11, op: osd_op(client.22857445.1:759980367 2.1884fbf0 rbd_data.61b4c6238e1f29.0000000000001c8b [set-alloc-hint object_size 4194304 write_size 4194304,write 3616768~65536] snapc 0=[] ondisk+write e71740)
   -71> 2017-06-03 10:42:27.680597 7f7172069700 20 osd.84 71740 share_map_peer 0x561465ddda80 already has epoch 71740
   -70> 2017-06-03 10:42:27.680599 7f7172069700  1 -- 172.16.3.14:6806/11590 --> 172.16.3.3:6818/14977 -- osd_repop(client.22857445.1:759980367 2.3f0 2:0fdf2118:::rbd_data.61b4c6238e1f29.0000000000001c8b:head v 71740'1378127) v1 -- ?+66267 0x56146bba9600 con 0x561465ddda80
   -69> 2017-06-03 10:42:27.680613 7f7172069700 10 osd.84 pg_epoch: 71740 pg[2.3f0( v 71739'1378126 (71527'1375122,71739'1378126] local-les=71729 n=6271 ec=130 les/c/f 71729/71729/67943 71727/71728/71728) [84,11] r=0 lpr=71728 crt=71739'1378124 lcod 71739'1378125 mlcod 71739'1378125 active+clean NIBBLEWISE] append_log log((71527'1375122,71739'1378126], crt=71739'1378124) [71740'1378127 (71739'1378125) modify   2:0fdf2118:::rbd_data.61b4c6238e1f29.0000000000001c8b:head by client.22857445.1:759980367 2017-06-03 10:42:27.670488]
   -68> 2017-06-03 10:42:27.680626 7f7172069700 10 osd.84 pg_epoch: 71740 pg[2.3f0( v 71740'1378127 (71527'1375122,71740'1378127] local-les=71729 n=6271 ec=130 les/c/f 71729/71729/67943 71727/71728/71728) [84,11] r=0 lpr=71728 luod=71739'1378126 lua=71739'1378126 crt=71739'1378124 lcod 71739'1378125 mlcod 71739'1378125 active+clean NIBBLEWISE] add_log_entry 71740'1378127 (71739'1378125) modify   2:0fdf2118:::rbd_data.61b4c6238e1f29.0000000000001c8b:head by client.22857445.1:759980367 2017-06-03 10:42:27.670488
   -67> 2017-06-03 10:42:27.680636 7f7172069700 10 osd.84 pg_epoch: 71740 pg[2.3f0( v 71740'1378127 (71527'1375122,71740'1378127] local-les=71729 n=6271 ec=130 les/c/f 71729/71729/67943 71727/71728/71728) [84,11] r=0 lpr=71728 luod=71739'1378126 lua=71739'1378126 crt=71739'1378125 lcod 71739'1378125 mlcod 71739'1378125 active+clean NIBBLEWISE] append_log: trimming to 71739'1378125 entries 71739'1378125 (71722'1378109) modify   2:0fdf2118:::rbd_data.61b4c6238e1f29.0000000000001c8b:head by client.22857445.1:759960105 2017-06-03 10:41:29.573977
   -66> 2017-06-03 10:42:27.680653 7f7172069700  5 write_log with: dirty_to: 0'0, dirty_from: 4294967295'18446744073709551615, dirty_divergent_priors: false, divergent_priors: 0, writeout_from: 71740'1378127, trimmed:
   -65> 2017-06-03 10:42:27.680711 7f7172069700  5 -- op tracker -- seq: 14015, time: 2017-06-03 10:42:27.680711, event: commit_queued_for_journal_write, op: osd_op(client.22857445.1:759980367 2.1884fbf0 rbd_data.61b4c6238e1f29.0000000000001c8b [set-alloc-hint object_size 4194304 write_size 4194304,write 3616768~65536] snapc 0=[] ondisk+write e71740)
   -64> 2017-06-03 10:42:27.680722 7f7172069700 10 osd.84 pg_epoch: 71740 pg[2.3f0( v 71740'1378127 (71527'1375122,71740'1378127] local-les=71729 n=6271 ec=130 les/c/f 71729/71729/67943 71727/71728/71728) [84,11] r=0 lpr=71728 luod=71739'1378126 lua=71739'1378126 crt=71739'1378125 lcod 71739'1378125 mlcod 71739'1378125 active+clean NIBBLEWISE] eval_repop repgather(0x56144dec5c20 71740'1378127 rep_tid=1391 committed?=0 applied?=0) wants=d
   -63> 2017-06-03 10:42:27.680731 7f7172069700 10 osd.84 71740 dequeue_op 0x56146d3bf700 finish
   -62> 2017-06-03 10:42:27.680732 7f71891ef700  5 -- op tracker -- seq: 14015, time: 2017-06-03 10:42:27.680731, event: write_thread_in_journal_buffer, op: osd_op(client.22857445.1:759980367 2.1884fbf0 rbd_data.61b4c6238e1f29.0000000000001c8b [set-alloc-hint object_size 4194304 write_size 4194304,write 3616768~65536] snapc 0=[] ondisk+write e71740)
   -61> 2017-06-03 10:42:27.680968 7f71889ee700  5 -- op tracker -- seq: 14015, time: 2017-06-03 10:42:27.680968, event: journaled_completion_queued, op: osd_op(client.22857445.1:759980367 2.1884fbf0 rbd_data.61b4c6238e1f29.0000000000001c8b [set-alloc-hint object_size 4194304 write_size 4194304,write 3616768~65536] snapc 0=[] ondisk+write e71740)
   -60> 2017-06-03 10:42:27.680983 7f71869ea700 10 osd.84 pg_epoch: 71740 pg[2.3f0( v 71740'1378127 (71527'1375122,71740'1378127] local-les=71729 n=6271 ec=130 les/c/f 71729/71729/67943 71727/71728/71728) [84,11] r=0 lpr=71728 luod=71739'1378126 lua=71739'1378126 crt=71739'1378125 lcod 71739'1378125 mlcod 71739'1378125 active+clean NIBBLEWISE] op_commit: 1391
   -59> 2017-06-03 10:42:27.680998 7f71869ea700  5 -- op tracker -- seq: 14015, time: 2017-06-03 10:42:27.680997, event: op_commit, op: osd_op(client.22857445.1:759980367 2.1884fbf0 rbd_data.61b4c6238e1f29.0000000000001c8b [set-alloc-hint object_size 4194304 write_size 4194304,write 3616768~65536] snapc 0=[] ondisk+write e71740)
   -58> 2017-06-03 10:42:27.681335 7f71861e9700 10 osd.84 pg_epoch: 71740 pg[2.3f0( v 71740'1378127 (71527'1375122,71740'1378127] local-les=71729 n=6271 ec=130 les/c/f 71729/71729/67943 71727/71728/71728) [84,11] r=0 lpr=71728 luod=71739'1378126 lua=71739'1378126 crt=71739'1378125 lcod 71739'1378125 mlcod 71739'1378125 active+clean NIBBLEWISE] op_applied: 1391
   -57> 2017-06-03 10:42:27.681353 7f71861e9700  5 -- op tracker -- seq: 14015, time: 2017-06-03 10:42:27.681353, event: op_applied, op: osd_op(client.22857445.1:759980367 2.1884fbf0 rbd_data.61b4c6238e1f29.0000000000001c8b [set-alloc-hint object_size 4194304 write_size 4194304,write 3616768~65536] snapc 0=[] ondisk+write e71740)
   -56> 2017-06-03 10:42:27.681364 7f71861e9700 10 osd.84 pg_epoch: 71740 pg[2.3f0( v 71740'1378127 (71527'1375122,71740'1378127] local-les=71729 n=6271 ec=130 les/c/f 71729/71729/67943 71727/71728/71728) [84,11] r=0 lpr=71728 luod=71739'1378126 lua=71739'1378126 crt=71739'1378125 lcod 71739'1378125 mlcod 71739'1378125 active+clean NIBBLEWISE] op_applied version 71740'1378127
   -55> 2017-06-03 10:42:27.683398 7f7160fd0700  1 -- 172.16.3.14:6806/11590 <== osd.11 172.16.3.3:6818/14977 147 ==== osd_repop_reply(client.22857445.1:759980367 2.3f0) v1 ==== 83+0+0 (3329426065 0 0) 0x56146914d180 con 0x561465ddda80
   -54> 2017-06-03 10:42:27.683413 7f7160fd0700 10 osd.84 71740 handle_replica_op osd_repop_reply(client.22857445.1:759980367 2.3f0) v1 epoch 71740
   -53> 2017-06-03 10:42:27.683417 7f7160fd0700 20 osd.84 71740 should_share_map osd.11 172.16.3.3:6818/14977 71740
   -52> 2017-06-03 10:42:27.683421 7f7160fd0700 15 osd.84 71740 enqueue_op 0x56146738aa00 prio 196 cost 0 latency 0.000046 osd_repop_reply(client.22857445.1:759980367 2.3f0) v1
   -51> 2017-06-03 10:42:27.683424 7f7160fd0700  5 -- op tracker -- seq: 14016, time: 2017-06-03 10:42:27.683424, event: queued_for_pg, op: osd_repop_reply(client.22857445.1:759980367 2.3f0)
   -50> 2017-06-03 10:42:27.683443 7f716f864700 10 osd.84 71740 dequeue_op 0x56146738aa00 prio 196 cost 0 latency 0.000067 osd_repop_reply(client.22857445.1:759980367 2.3f0) v1 pg pg[2.3f0( v 71740'1378127 (71527'1375122,71740'1378127] local-les=71729 n=6271 ec=130 les/c/f 71729/71729/67943 71727/71728/71728) [84,11] r=0 lpr=71728 luod=71739'1378126 crt=71739'1378125 lcod 71739'1378125 mlcod 71739'1378125 active+clean NIBBLEWISE]
   -49> 2017-06-03 10:42:27.683457 7f716f864700  5 -- op tracker -- seq: 14016, time: 2017-06-03 10:42:27.683457, event: reached_pg, op: osd_repop_reply(client.22857445.1:759980367 2.3f0)
   -48> 2017-06-03 10:42:27.683462 7f716f864700 10 osd.84 pg_epoch: 71740 pg[2.3f0( v 71740'1378127 (71527'1375122,71740'1378127] local-les=71729 n=6271 ec=130 les/c/f 71729/71729/67943 71727/71728/71728) [84,11] r=0 lpr=71728 luod=71739'1378126 crt=71739'1378125 lcod 71739'1378125 mlcod 71739'1378125 active+clean NIBBLEWISE] handle_message: 0x56146738aa00
   -47> 2017-06-03 10:42:27.683471 7f716f864700  5 -- op tracker -- seq: 14016, time: 2017-06-03 10:42:27.683470, event: started, op: osd_repop_reply(client.22857445.1:759980367 2.3f0 ondisk, result = 0)
   -46> 2017-06-03 10:42:27.683473 7f716f864700  7 osd.84 pg_epoch: 71740 pg[2.3f0( v 71740'1378127 (71527'1375122,71740'1378127] local-les=71729 n=6271 ec=130 les/c/f 71729/71729/67943 71727/71728/71728) [84,11] r=0 lpr=71728 luod=71739'1378126 crt=71739'1378125 lcod 71739'1378125 mlcod 71739'1378125 active+clean NIBBLEWISE] sub_op_modify_reply: tid 1391 op  ack_type 4 from 11
   -45> 2017-06-03 10:42:27.683481 7f716f864700  5 -- op tracker -- seq: 14015, time: 2017-06-03 10:42:27.683481, event: sub_op_commit_rec from 11, op: osd_op(client.22857445.1:759980367 2.1884fbf0 rbd_data.61b4c6238e1f29.0000000000001c8b [set-alloc-hint object_size 4194304 write_size 4194304,write 3616768~65536] snapc 0=[] ondisk+write e71740)
   -44> 2017-06-03 10:42:27.683490 7f716f864700 10 osd.84 pg_epoch: 71740 pg[2.3f0( v 71740'1378127 (71527'1375122,71740'1378127] local-les=71729 n=6271 ec=130 les/c/f 71729/71729/67943 71727/71728/71728) [84,11] r=0 lpr=71728 luod=71739'1378126 crt=71739'1378125 lcod 71739'1378125 mlcod 71739'1378125 active+clean NIBBLEWISE] repop_all_applied: repop tid 1391 all applied
   -43> 2017-06-03 10:42:27.683496 7f716f864700 10 osd.84 pg_epoch: 71740 pg[2.3f0( v 71740'1378127 (71527'1375122,71740'1378127] local-les=71729 n=6271 ec=130 les/c/f 71729/71729/67943 71727/71728/71728) [84,11] r=0 lpr=71728 luod=71739'1378126 crt=71739'1378125 lcod 71739'1378125 mlcod 71739'1378125 active+clean NIBBLEWISE] eval_repop repgather(0x56144dec5c20 71740'1378127 rep_tid=1391 committed?=0 applied?=1) wants=d
   -42> 2017-06-03 10:42:27.683502 7f716f864700 10 osd.84 pg_epoch: 71740 pg[2.3f0( v 71740'1378127 (71527'1375122,71740'1378127] local-les=71729 n=6271 ec=130 les/c/f 71729/71729/67943 71727/71728/71728) [84,11] r=0 lpr=71728 luod=71739'1378126 crt=71739'1378125 lcod 71739'1378125 mlcod 71739'1378125 active+clean NIBBLEWISE]  applied: repgather(0x56144dec5c20 71740'1378127 rep_tid=1391 committed?=0 applied?=1)
   -41> 2017-06-03 10:42:27.683508 7f716f864700 10 osd.84 pg_epoch: 71740 pg[2.3f0( v 71740'1378127 (71527'1375122,71740'1378127] local-les=71729 n=6271 ec=130 les/c/f 71729/71729/67943 71727/71728/71728) [84,11] r=0 lpr=71728 luod=71739'1378126 crt=71739'1378125 lcod 71739'1378125 mlcod 71739'1378125 active+clean NIBBLEWISE] repop_all_committed: repop tid 1391 all committed
   -40> 2017-06-03 10:42:27.683513 7f716f864700 10 osd.84 pg_epoch: 71740 pg[2.3f0( v 71740'1378127 (71527'1375122,71740'1378127] local-les=71729 n=6271 ec=130 les/c/f 71729/71729/67943 71727/71728/71728) [84,11] r=0 lpr=71728 crt=71739'1378125 lcod 71739'1378126 mlcod 71739'1378125 active+clean NIBBLEWISE] eval_repop repgather(0x56144dec5c20 71740'1378127 rep_tid=1391 committed?=1 applied?=1) wants=d
   -39> 2017-06-03 10:42:27.683519 7f716f864700 10 osd.84 pg_epoch: 71740 pg[2.3f0( v 71740'1378127 (71527'1375122,71740'1378127] local-les=71729 n=6271 ec=130 les/c/f 71729/71729/67943 71727/71728/71728) [84,11] r=0 lpr=71728 crt=71739'1378125 lcod 71739'1378126 mlcod 71739'1378125 active+clean NIBBLEWISE]  commit: repgather(0x56144dec5c20 71740'1378127 rep_tid=1391 committed?=1 applied?=1)
   -38> 2017-06-03 10:42:27.683525 7f716f864700 15 osd.84 pg_epoch: 71740 pg[2.3f0( v 71740'1378127 (71527'1375122,71740'1378127] local-les=71729 n=6271 ec=130 les/c/f 71729/71729/67943 71727/71728/71728) [84,11] r=0 lpr=71728 crt=71739'1378125 lcod 71739'1378126 mlcod 71739'1378125 active+clean NIBBLEWISE] log_op_stats osd_op(client.22857445.1:759980367 2.1884fbf0 rbd_data.61b4c6238e1f29.0000000000001c8b [set-alloc-hint object_size 4194304 write_size 4194304,write 3616768~65536] snapc 0=[] ondisk+write e71740) v4 inb 65536 outb 0 rlat 0.003504 lat 0.003521
   -37> 2017-06-03 10:42:27.683533 7f716f864700 10 osd.84 pg_epoch: 71740 pg[2.3f0( v 71740'1378127 (71527'1375122,71740'1378127] local-les=71729 n=6271 ec=130 les/c/f 71729/71729/67943 71727/71728/71728) [84,11] r=0 lpr=71728 crt=71739'1378125 lcod 71739'1378126 mlcod 71739'1378125 active+clean NIBBLEWISE]  sending commit on osd_op(client.22857445.1:759980367 2.1884fbf0 rbd_data.61b4c6238e1f29.0000000000001c8b [set-alloc-hint object_size 4194304 write_size 4194304,write 3616768~65536] snapc 0=[] ondisk+write e71740) v4 0x561469257e40
   -36> 2017-06-03 10:42:27.683540 7f716f864700  1 -- 172.16.2.114:6806/11590 --> 172.16.2.212:0/2238053329 -- osd_op_reply(759980367 rbd_data.61b4c6238e1f29.0000000000001c8b [set-alloc-hint object_size 4194304 write_size 4194304,write 3616768~65536] v71740'1378127 uv6406885 ondisk = 0) v7 -- ?+0 0x561469257e40 con 0x5614678fb480
   -35> 2017-06-03 10:42:27.683553 7f716f864700  5 -- op tracker -- seq: 14015, time: 2017-06-03 10:42:27.683553, event: commit_sent, op: osd_op(client.22857445.1:759980367 2.1884fbf0 rbd_data.61b4c6238e1f29.0000000000001c8b [set-alloc-hint object_size 4194304 write_size 4194304,write 3616768~65536] snapc 0=[] ondisk+write e71740)
   -34> 2017-06-03 10:42:27.683561 7f716f864700 10 osd.84 pg_epoch: 71740 pg[2.3f0( v 71740'1378127 (71527'1375122,71740'1378127] local-les=71729 n=6271 ec=130 les/c/f 71729/71729/67943 71727/71728/71728) [84,11] r=0 lpr=71728 crt=71739'1378125 lcod 71739'1378126 mlcod 71739'1378125 active+clean NIBBLEWISE]  applied: repgather(0x56144dec5c20 71740'1378127 rep_tid=1391 committed?=1 applied?=1)
   -33> 2017-06-03 10:42:27.683573 7f716f864700 15 osd.84 pg_epoch: 71740 pg[2.3f0( v 71740'1378127 (71527'1375122,71740'1378127] local-les=71729 n=6271 ec=130 les/c/f 71729/71729/67943 71727/71728/71728) [84,11] r=0 lpr=71728 crt=71739'1378125 lcod 71739'1378126 mlcod 71739'1378125 active+clean NIBBLEWISE] publish_stats_to_osd 71740:1137643
   -32> 2017-06-03 10:42:27.683582 7f716f864700 15 osd.84 pg_epoch: 71740 pg[2.3f0( v 71740'1378127 (71527'1375122,71740'1378127] local-les=71729 n=6271 ec=130 les/c/f 71729/71729/67943 71727/71728/71728) [84,11] r=0 lpr=71728 crt=71739'1378125 lcod 71739'1378126 mlcod 71739'1378126 active+clean NIBBLEWISE] do_osd_op_effects client.22857445 con 0x5614678fb480
   -31> 2017-06-03 10:42:27.683592 7f716f864700 10 osd.84 pg_epoch: 71740 pg[2.3f0( v 71740'1378127 (71527'1375122,71740'1378127] local-les=71729 n=6271 ec=130 les/c/f 71729/71729/67943 71727/71728/71728) [84,11] r=0 lpr=71728 crt=71739'1378125 lcod 71739'1378126 mlcod 71739'1378126 active+clean NIBBLEWISE]  removing repgather(0x56144dec5c20 71740'1378127 rep_tid=1391 committed?=1 applied?=1)
   -30> 2017-06-03 10:42:27.683601 7f716f864700 20 osd.84 pg_epoch: 71740 pg[2.3f0( v 71740'1378127 (71527'1375122,71740'1378127] local-les=71729 n=6271 ec=130 les/c/f 71729/71729/67943 71727/71728/71728) [84,11] r=0 lpr=71728 crt=71739'1378125 lcod 71739'1378126 mlcod 71739'1378126 active+clean NIBBLEWISE]    q front is repgather(0x56144dec5c20 71740'1378127 rep_tid=1391 committed?=1 applied?=1)
   -29> 2017-06-03 10:42:27.683610 7f716f864700 20 osd.84 pg_epoch: 71740 pg[2.3f0( v 71740'1378127 (71527'1375122,71740'1378127] local-les=71729 n=6271 ec=130 les/c/f 71729/71729/67943 71727/71728/71728) [84,11] r=0 lpr=71728 crt=71739'1378125 lcod 71739'1378126 mlcod 71739'1378126 active+clean NIBBLEWISE] remove_repop repgather(0x56144dec5c20 71740'1378127 rep_tid=1391 committed?=1 applied?=1)
   -28> 2017-06-03 10:42:27.683623 7f716f864700 15 osd.84 pg_epoch: 71740 pg[2.3f0( v 71740'1378127 (71527'1375122,71740'1378127] local-les=71729 n=6271 ec=130 les/c/f 71729/71729/67943 71727/71728/71728) [84,11] r=0 lpr=71728 crt=71739'1378125 lcod 71739'1378126 mlcod 71739'1378126 active+clean NIBBLEWISE]  requeue_ops
   -27> 2017-06-03 10:42:27.683634 7f716f864700  5 -- op tracker -- seq: 14015, time: 2017-06-03 10:42:27.683633, event: done, op: osd_op(client.22857445.1:759980367 2.1884fbf0 rbd_data.61b4c6238e1f29.0000000000001c8b [set-alloc-hint object_size 4194304 write_size 4194304,write 3616768~65536] snapc 0=[] ondisk+write e71740)
   -26> 2017-06-03 10:42:27.683654 7f716f864700 10 osd.84 71740 dequeue_op 0x56146738aa00 finish
   -25> 2017-06-03 10:42:27.683658 7f716f864700  5 -- op tracker -- seq: 14016, time: 2017-06-03 10:42:27.683657, event: done, op: osd_repop_reply(client.22857445.1:759980367 2.3f0 ondisk, result = 0)
   -24> 2017-06-03 10:42:27.690237 7f7140a75700  1 -- 172.16.2.114:6807/11590 <== osd.35 172.16.2.105:0/4630 74 ==== osd_ping(ping e71740 stamp 2017-06-03 10:42:27.691894) v2 ==== 47+0+0 (1336756362 0 0) 0x56146804e800 con 0x561467283900
   -23> 2017-06-03 10:42:27.690252 7f7140a75700  1 -- 172.16.2.114:6807/11590 --> 172.16.2.105:0/4630 -- osd_ping(ping_reply e71740 stamp 2017-06-03 10:42:27.691894) v2 -- ?+0 0x561468151e00 con 0x561467283900
   -22> 2017-06-03 10:42:27.690265 7f7140a75700 20 osd.84 71740 share_map_peer 0x561442ea1000 already has epoch 71740
   -21> 2017-06-03 10:42:27.690258 7f7140b76700  1 -- 172.16.3.14:6807/11590 <== osd.35 172.16.2.105:0/4630 74 ==== osd_ping(ping e71740 stamp 2017-06-03 10:42:27.691894) v2 ==== 47+0+0 (1336756362 0 0) 0x56146804ee00 con 0x5614678f8300
   -20> 2017-06-03 10:42:27.690275 7f7140b76700  1 -- 172.16.3.14:6807/11590 --> 172.16.2.105:0/4630 -- osd_ping(ping_reply e71740 stamp 2017-06-03 10:42:27.691894) v2 -- ?+0 0x56146d350000 con 0x5614678f8300
   -19> 2017-06-03 10:42:27.690287 7f7140b76700 20 osd.84 71740 share_map_peer 0x561442ea1000 already has epoch 71740
   -18> 2017-06-03 10:42:27.713217 7f713ee59700  1 -- 172.16.3.14:6807/11590 <== osd.38 172.16.2.105:0/4910 78 ==== osd_ping(ping e71740 stamp 2017-06-03 10:42:27.714244) v2 ==== 47+0+0 (140188268 0 0) 0x561468170600 con 0x5614678f8d80
   -17> 2017-06-03 10:42:27.713238 7f713ee59700  1 -- 172.16.3.14:6807/11590 --> 172.16.2.105:0/4910 -- osd_ping(ping_reply e71740 stamp 2017-06-03 10:42:27.714244) v2 -- ?+0 0x561469cb1c00 con 0x5614678f8d80
   -16> 2017-06-03 10:42:27.713254 7f713ee59700 20 osd.84 71740 share_map_peer 0x56146631cb80 already has epoch 71740
   -15> 2017-06-03 10:42:27.713463 7f713ef5a700  1 -- 172.16.2.114:6807/11590 <== osd.38 172.16.2.105:0/4910 78 ==== osd_ping(ping e71740 stamp 2017-06-03 10:42:27.714244) v2 ==== 47+0+0 (140188268 0 0) 0x561468170800 con 0x5614678f9500
   -14> 2017-06-03 10:42:27.713486 7f713ef5a700  1 -- 172.16.2.114:6807/11590 --> 172.16.2.105:0/4910 -- osd_ping(ping_reply e71740 stamp 2017-06-03 10:42:27.714244) v2 -- ?+0 0x561469cb1200 con 0x5614678f9500
   -13> 2017-06-03 10:42:27.713498 7f713ef5a700 20 osd.84 71740 share_map_peer 0x56146631cb80 already has epoch 71740
   -12> 2017-06-03 10:42:27.722765 7f7148bf6700  1 -- 172.16.3.14:6807/11590 <== osd.15 172.16.3.3:0/5388 79 ==== osd_ping(ping e71740 stamp 2017-06-03 10:42:27.713200) v2 ==== 47+0+0 (1176829029 0 0) 0x5614671ed000 con 0x56146824cd80
   -11> 2017-06-03 10:42:27.722794 7f7148bf6700  1 -- 172.16.3.14:6807/11590 --> 172.16.3.3:0/5388 -- osd_ping(ping_reply e71740 stamp 2017-06-03 10:42:27.713200) v2 -- ?+0 0x56146cc2bc00 con 0x56146824cd80
   -10> 2017-06-03 10:42:27.722808 7f7148bf6700 20 osd.84 71740 share_map_peer 0x561466029a80 already has epoch 71740
    -9> 2017-06-03 10:42:27.722960 7f7148af5700  1 -- 172.16.2.114:6807/11590 <== osd.15 172.16.3.3:0/5388 79 ==== osd_ping(ping e71740 stamp 2017-06-03 10:42:27.713200) v2 ==== 47+0+0 (1176829029 0 0) 0x5614671ed200 con 0x56146824d980
    -8> 2017-06-03 10:42:27.722977 7f7148af5700  1 -- 172.16.2.114:6807/11590 --> 172.16.3.3:0/5388 -- osd_ping(ping_reply e71740 stamp 2017-06-03 10:42:27.713200) v2 -- ?+0 0x56146645ae00 con 0x56146824d980
    -7> 2017-06-03 10:42:27.722991 7f7148af5700 20 osd.84 71740 share_map_peer 0x561466029a80 already has epoch 71740
    -6> 2017-06-03 10:42:27.727100 7f713fa65700  1 -- 172.16.2.114:6807/11590 <== osd.57 172.16.2.107:0/3819 74 ==== osd_ping(ping e71740 stamp 2017-06-03 10:42:27.727036) v2 ==== 47+0+0 (2829453915 0 0) 0x56146751c600 con 0x5614678f9080
    -5> 2017-06-03 10:42:27.727114 7f713fa65700  1 -- 172.16.2.114:6807/11590 --> 172.16.2.107:0/3819 -- osd_ping(ping_reply e71740 stamp 2017-06-03 10:42:27.727036) v2 -- ?+0 0x56146986f600 con 0x5614678f9080
    -4> 2017-06-03 10:42:27.727126 7f713fa65700 20 osd.84 71740 share_map_peer 0x561466746300 already has epoch 71740
    -3> 2017-06-03 10:42:27.727895 7f713fb66700  1 -- 172.16.3.14:6807/11590 <== osd.57 172.16.2.107:0/3819 74 ==== osd_ping(ping e71740 stamp 2017-06-03 10:42:27.727036) v2 ==== 47+0+0 (2829453915 0 0) 0x56146751c000 con 0x5614678f8900
    -2> 2017-06-03 10:42:27.727914 7f713fb66700  1 -- 172.16.3.14:6807/11590 --> 172.16.2.107:0/3819 -- osd_ping(ping_reply e71740 stamp 2017-06-03 10:42:27.727036) v2 -- ?+0 0x56146c891000 con 0x5614678f8900
    -1> 2017-06-03 10:42:27.727927 7f713fb66700 20 osd.84 71740 share_map_peer 0x561466746300 already has epoch 71740
     0> 2017-06-03 10:42:27.743760 7f716e862700 -1 *** Caught signal (Aborted) **
 in thread 7f716e862700 thread_name:tp_osd_recov

 ceph version 10.2.7 (50e863e0f4bc8f4b9e31156de690d765af245185)
 1: (()+0x9564a7) [0x5614333db4a7]
 2: (()+0xf890) [0x7f719633e890]
 3: (gsignal()+0x37) [0x7f7194385067]
 4: (abort()+0x148) [0x7f7194386448]
 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x256) [0x5614334e13d6]
 6: (ReplicatedPG::recover_replicas(int, ThreadPool::TPHandle&)+0x62f) [0x561432f6e0ff]
 7: (ReplicatedPG::start_recovery_ops(int, ThreadPool::TPHandle&, int*)+0xa8a) [0x561432fa178a]
 8: (OSD::do_recovery(PG*, ThreadPool::TPHandle&)+0x36d) [0x561432e1abbd]
 9: (ThreadPool::WorkQueue<PG>::_void_process(void*, ThreadPool::TPHandle&)+0x1d) [0x561432e6588d]
 10: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa9f) [0x5614334d1e3f]
 11: (ThreadPool::WorkThread::entry()+0x10) [0x5614334d2d70]
 12: (()+0x8064) [0x7f7196337064]
 13: (clone()+0x6d) [0x7f719443862d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux