Attaching with logging to level 20. After repeat attempts by removing nobackfill I have got it down to: recovery 31892/272325586 objects degraded (0.012%) recovery 2/272325586 objects misplaced (0.000%) However any further attempts after removing nobackfill just causes an instant crash on 83 & 84, at this point I feel there is some corruption on the remaining 11 OSD’s of the PG however the error’s
aren’t directly saying that, however always end the crash with: -1 *** Caught signal (Aborted) ** in thread 7f716e862700 thread_name:tp_osd_recov ,Ashley From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx]
On Behalf Of Ashley Merrick
I have now done some further testing and seeing these errors on 84 / 83 the OSD’s that crash while backfilling to 10,11 -60> 2017-06-03 10:08:56.651768 7f6f76714700 1 -- 172.16.3.14:6823/2694 <== osd.3 172.16.2.101:0/25361 10 ==== osd_ping(ping e71688 stamp 2017-06-03 10:08:56.652035) v2 ==== 47+0+0 (1097709006
0 0) 0x5569ea88d400 con 0x5569e900e300 -59> 2017-06-03 10:08:56.651804 7f6f76714700 1 -- 172.16.3.14:6823/2694 --> 172.16.2.101:0/25361 -- osd_ping(ping_reply e71688 stamp 2017-06-03 10:08:56.652035) v2 -- ?+0 0x5569e985fc00 con 0x5569e900e300 -6> 2017-06-03 10:08:56.937156 7f6f5ee4d700 1 -- 172.16.3.14:6822/2694 <== osd.53 172.16.3.7:6816/15230 13 ==== MOSDECSubOpReadReply(6.14s3 71688 ECSubReadReply(tid=83, attrs_read=0)) v1 ====
148+0+0 (2355392791 0 0) 0x5569e8b22080 con 0x5569e9538f00 -5> 2017-06-03 10:08:56.937193 7f6f5ee4d700 5 -- op tracker -- seq: 2409, time: 2017-06-03 10:08:56.937193, event: queued_for_pg, op: MOSDECSubOpReadReply(6.14s3 71688 ECSubReadReply(tid=83,
attrs_read=0)) -4> 2017-06-03 10:08:56.937241 7f6f8ef8a700 5 -- op tracker -- seq: 2409, time: 2017-06-03 10:08:56.937240, event: reached_pg, op: MOSDECSubOpReadReply(6.14s3 71688 ECSubReadReply(tid=83, attrs_read=0)) -3> 2017-06-03 10:08:56.937266 7f6f8ef8a700 0 osd.83 pg_epoch: 71688 pg[6.14s3( v 71685'35512 (68694'30812,71685'35512] local-les=71688 n=15928 ec=31534 les/c/f 71688/69510/67943 71687/71687/71687)
[11,10,2147483647,83,22,26,69,72,53,59,8,4,46]/[2147483647,2147483647,2147483647,83,22,26,69,72,53,59,8,4,46] r=3 lpr=71687 pi=47065-71686/711 rops=1 bft=10(1),11(0) crt=71629'35509 mlcod 0'0 active+undersized+degraded+remapped+inconsistent+backfilling NIBBLEWISE]
failed_push 6:28170432:::rbd_data.e3d8852ae8944a.0000000000047d28:head from shard 53(8), reps on unfound? 0 -2> 2017-06-03 10:08:56.937346 7f6f8ef8a700 5 -- op tracker -- seq: 2409, time: 2017-06-03 10:08:56.937345, event: done, op: MOSDECSubOpReadReply(6.14s3 71688 ECSubReadReply(tid=83, attrs_read=0)) -1> 2017-06-03 10:08:56.937351 7f6f89f80700 -1 osd.83 pg_epoch: 71688 pg[6.14s3( v 71685'35512 (68694'30812,71685'35512] local-les=71688 n=15928 ec=31534 les/c/f 71688/69510/67943 71687/71687/71687)
[11,10,2147483647,83,22,26,69,72,53,59,8,4,46]/[2147483647,2147483647,2147483647,83,22,26,69,72,53,59,8,4,46] r=3 lpr=71687 pi=47065-71686/711 bft=10(1),11(0) crt=71629'35509 mlcod 0'0 active+undersized+degraded+remapped+inconsistent+backfilling
NIBBLEWISE] recover_replicas: object added to missing set for backfill, but is not in recovering, error! -42> 2017-06-03 10:08:56.968433 7f6f5f04f700 1 -- 172.16.2.114:6822/2694 <== client.22857445 172.16.2.212:0/2238053329 56 ==== osd_op(client.22857445.1:759236283 2.e732321d rbd_data.61b4c6238e1f29.000000000001ea27
[set-alloc-hint object_size 4194304 write_size 4194304,write 126976~45056] snapc 0=[] ondisk+write e71688) v4 ==== 217+0+45056 (2626314663 0 3883338397) 0x5569ea886b00 con 0x5569ea99c880 From: Ashley Merrick
From this extract from pg query: "up": [ 11, 10, 84, 83, 22, 26, 69, 72, 53, 59, 8, 4, 46 ], "acting": [ 2147483647, 2147483647, 84, 83, 22, 26, 69, 72, 53, 59, 8, 4, 46 I am wondering if there is an issue on 11 , 10 causing the current active primary “acting_primar": 84” to crash. But can’t see anything that could be causing it. ,Ashley From: Ashley Merrick
Have attached the full pg query for the effected PG encase this shows anything of interest. Thanks From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx]
On Behalf Of Ashley Merrick
Have a PG which is stuck in this state (Is an EC with K=10 M=3) pg 6.14 is active+undersized+degraded+remapped+inconsistent+backfilling, acting [2147483647,2147483647,84,83,22,26,69,72,53,59,8,4,46] Currently have no-recover set, if I unset no recover both OSD 83 + 84 start to flap and go up and down, I see the following in the log's of the OSD. ***** -5> 2017-06-01 10:08:29.658593 7f430ec97700 1 -- 172.16.3.14:6806/5204 <== osd.17 172.16.3.3:6806/2006016 57 ==== MOSDECSubOpWriteReply(6.31as0 71513 ECSubWriteReply(tid=152, last_complete=0'0,
committed=0, applied=1)) v1 ==== 67+0+0 (245959818 0 0) 0x563c9db7be00 con 0x563c9cfca480 -4> 2017-06-01 10:08:29.658620 7f430ec97700 5 -- op tracker -- seq: 2367, time: 2017-06-01 10:08:29.658620, event: queued_for_pg, op: MOSDECSubOpWriteReply(6.31as0 71513 ECSubWriteReply(tid=152,
last_complete=0'0, committed=0, applied=1)) -3> 2017-06-01 10:08:29.658649 7f4319e11700 5 -- op tracker -- seq: 2367, time: 2017-06-01 10:08:29.658649, event: reached_pg, op: MOSDECSubOpWriteReply(6.31as0 71513 ECSubWriteReply(tid=152,
last_complete=0'0, committed=0, applied=1)) -2> 2017-06-01 10:08:29.658661 7f4319e11700 5 -- op tracker -- seq: 2367, time: 2017-06-01 10:08:29.658660, event: done, op: MOSDECSubOpWriteReply(6.31as0 71513 ECSubWriteReply(tid=152, last_complete=0'0,
committed=0, applied=1)) -1> 2017-06-01 10:08:29.663107 7f43320ec700 5 -- op tracker -- seq: 2317, time: 2017-06-01 10:08:29.663107, event: sub_op_applied, op: osd_op(osd.79.66617:8675008 6.82058b1a rbd_data.e5208a238e1f29.0000000000025f3e
[copy-from ver 4678410] snapc 0=[] ondisk+write+ignore_overlay+enforce_snapc+known_if_redirected e71513) 0> 2017-06-01 10:08:29.663474 7f4319610700 -1 *** Caught signal (Aborted) ** in thread 7f4319610700 thread_name:tp_osd_recov ceph version 10.2.7 (50e863e0f4bc8f4b9e31156de690d765af245185) 1: (()+0x9564a7) [0x563c6a6f24a7] 2: (()+0xf890) [0x7f4342308890] 3: (gsignal()+0x37) [0x7f434034f067] 4: (abort()+0x148) [0x7f4340350448] 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x256) [0x563c6a7f83d6] 6: (ReplicatedPG::recover_replicas(int, ThreadPool::TPHandle&)+0x62f) [0x563c6a2850ff] 7: (ReplicatedPG::start_recovery_ops(int, ThreadPool::TPHandle&, int*)+0xa8a) [0x563c6a2b878a] 8: (OSD::do_recovery(PG*, ThreadPool::TPHandle&)+0x36d) [0x563c6a131bbd] 9: (ThreadPool::WorkQueue<PG>::_void_process(void*, ThreadPool::TPHandle&)+0x1d) [0x563c6a17c88d] 10: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa9f) [0x563c6a7e8e3f] 11: (ThreadPool::WorkThread::entry()+0x10) [0x563c6a7e9d70] 12: (()+0x8064) [0x7f4342301064] 13: (clone()+0x6d) [0x7f434040262d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. ***** What should my next steps be? Thanks! |
-86> 2017-06-03 10:42:27.680419 7f7172069700 5 -- op tracker -- seq: 14015, time: 2017-06-03 10:42:27.680418, event: started, op: osd_op(client.22857445.1:759980367 2.1884fbf0 rbd_data.61b4c6238e1f29.0000000000001c8b [set-alloc-hint object_size 4194304 write_size 4194304,write 3616768~65536] snapc 0=[] ondisk+write e71740) -85> 2017-06-03 10:42:27.680423 7f7172069700 10 osd.84 pg_epoch: 71740 pg[2.3f0( v 71739'1378126 (71527'1375122,71739'1378126] local-les=71729 n=6271 ec=130 les/c/f 71729/71729/67943 71727/71728/71728) [84,11] r=0 lpr=71728 crt=71739'1378124 lcod 71739'1378125 mlcod 71739'1378125 active+clean NIBBLEWISE] execute_ctx 0x56146b613c00 -84> 2017-06-03 10:42:27.680441 7f7172069700 10 osd.84 pg_epoch: 71740 pg[2.3f0( v 71739'1378126 (71527'1375122,71739'1378126] local-les=71729 n=6271 ec=130 les/c/f 71729/71729/67943 71727/71728/71728) [84,11] r=0 lpr=71728 crt=71739'1378124 lcod 71739'1378125 mlcod 71739'1378125 active+clean NIBBLEWISE] do_op 2:0fdf2118:::rbd_data.61b4c6238e1f29.0000000000001c8b:head [set-alloc-hint object_size 4194304 write_size 4194304,write 3616768~65536] ov 71739'1378125 av 71740'1378127 snapc 0=[] snapset 0=[]:[]+head -83> 2017-06-03 10:42:27.680450 7f7172069700 10 osd.84 pg_epoch: 71740 pg[2.3f0( v 71739'1378126 (71527'1375122,71739'1378126] local-les=71729 n=6271 ec=130 les/c/f 71729/71729/67943 71727/71728/71728) [84,11] r=0 lpr=71728 crt=71739'1378124 lcod 71739'1378125 mlcod 71739'1378125 active+clean NIBBLEWISE] do_osd_op 2:0fdf2118:::rbd_data.61b4c6238e1f29.0000000000001c8b:head [set-alloc-hint object_size 4194304 write_size 4194304,write 3616768~65536] -82> 2017-06-03 10:42:27.680457 7f7172069700 10 osd.84 pg_epoch: 71740 pg[2.3f0( v 71739'1378126 (71527'1375122,71739'1378126] local-les=71729 n=6271 ec=130 les/c/f 71729/71729/67943 71727/71728/71728) [84,11] r=0 lpr=71728 crt=71739'1378124 lcod 71739'1378125 mlcod 71739'1378125 active+clean NIBBLEWISE] do_osd_op set-alloc-hint object_size 4194304 write_size 4194304 -81> 2017-06-03 10:42:27.680466 7f7172069700 10 osd.84 pg_epoch: 71740 pg[2.3f0( v 71739'1378126 (71527'1375122,71739'1378126] local-les=71729 n=6271 ec=130 les/c/f 71729/71729/67943 71727/71728/71728) [84,11] r=0 lpr=71728 crt=71739'1378124 lcod 71739'1378125 mlcod 71739'1378125 active+clean NIBBLEWISE] do_osd_op write 3616768~65536 -80> 2017-06-03 10:42:27.680474 7f7172069700 20 osd.84 pg_epoch: 71740 pg[2.3f0( v 71739'1378126 (71527'1375122,71739'1378126] local-les=71729 n=6271 ec=130 les/c/f 71729/71729/67943 71727/71728/71728) [84,11] r=0 lpr=71728 crt=71739'1378124 lcod 71739'1378125 mlcod 71739'1378125 active+clean NIBBLEWISE] make_writeable 2:0fdf2118:::rbd_data.61b4c6238e1f29.0000000000001c8b:head snapset=0x56146baf81f8 snapc=0=[] -79> 2017-06-03 10:42:27.680481 7f7172069700 20 osd.84 pg_epoch: 71740 pg[2.3f0( v 71739'1378126 (71527'1375122,71739'1378126] local-les=71729 n=6271 ec=130 les/c/f 71729/71729/67943 71727/71728/71728) [84,11] r=0 lpr=71728 crt=71739'1378124 lcod 71739'1378125 mlcod 71739'1378125 active+clean NIBBLEWISE] make_writeable 2:0fdf2118:::rbd_data.61b4c6238e1f29.0000000000001c8b:head done, snapset=0=[]:[]+head -78> 2017-06-03 10:42:27.680488 7f7172069700 20 osd.84 pg_epoch: 71740 pg[2.3f0( v 71739'1378126 (71527'1375122,71739'1378126] local-les=71729 n=6271 ec=130 les/c/f 71729/71729/67943 71727/71728/71728) [84,11] r=0 lpr=71728 crt=71739'1378124 lcod 71739'1378125 mlcod 71739'1378125 active+clean NIBBLEWISE] finish_ctx 2:0fdf2118:::rbd_data.61b4c6238e1f29.0000000000001c8b:head 0x56146b613c00 op modify -77> 2017-06-03 10:42:27.680500 7f7172069700 10 osd.84 pg_epoch: 71740 pg[2.3f0( v 71739'1378126 (71527'1375122,71739'1378126] local-les=71729 n=6271 ec=130 les/c/f 71729/71729/67943 71727/71728/71728) [84,11] r=0 lpr=71728 crt=71739'1378124 lcod 71739'1378125 mlcod 71739'1378125 active+clean NIBBLEWISE] set mtime to 2017-06-03 10:42:27.670488 -76> 2017-06-03 10:42:27.680511 7f7172069700 10 osd.84 pg_epoch: 71740 pg[2.3f0( v 71739'1378126 (71527'1375122,71739'1378126] local-les=71729 n=6271 ec=130 les/c/f 71729/71729/67943 71727/71728/71728) [84,11] r=0 lpr=71728 crt=71739'1378124 lcod 71739'1378125 mlcod 71739'1378125 active+clean NIBBLEWISE] final snapset 0=[]:[]+head in 2:0fdf2118:::rbd_data.61b4c6238e1f29.0000000000001c8b:head -75> 2017-06-03 10:42:27.680524 7f7172069700 20 osd.84 pg_epoch: 71740 pg[2.3f0( v 71739'1378126 (71527'1375122,71739'1378126] local-les=71729 n=6271 ec=130 les/c/f 71729/71729/67943 71727/71728/71728) [84,11] r=0 lpr=71728 crt=71739'1378124 lcod 71739'1378125 mlcod 71739'1378125 active+clean NIBBLEWISE] zeroing write result code 0 -74> 2017-06-03 10:42:27.680531 7f7172069700 10 osd.84 pg_epoch: 71740 pg[2.3f0( v 71739'1378126 (71527'1375122,71739'1378126] local-les=71729 n=6271 ec=130 les/c/f 71729/71729/67943 71727/71728/71728) [84,11] r=0 lpr=71728 crt=71739'1378124 lcod 71739'1378125 mlcod 71739'1378125 active+clean NIBBLEWISE] new_repop rep_tid 1391 on osd_op(client.22857445.1:759980367 2.1884fbf0 rbd_data.61b4c6238e1f29.0000000000001c8b [set-alloc-hint object_size 4194304 write_size 4194304,write 3616768~65536] snapc 0=[] ondisk+write e71740) v4 -73> 2017-06-03 10:42:27.680538 7f7172069700 7 osd.84 pg_epoch: 71740 pg[2.3f0( v 71739'1378126 (71527'1375122,71739'1378126] local-les=71729 n=6271 ec=130 les/c/f 71729/71729/67943 71727/71728/71728) [84,11] r=0 lpr=71728 crt=71739'1378124 lcod 71739'1378125 mlcod 71739'1378125 active+clean NIBBLEWISE] issue_repop rep_tid 1391 o 2:0fdf2118:::rbd_data.61b4c6238e1f29.0000000000001c8b:head -72> 2017-06-03 10:42:27.680580 7f7172069700 5 -- op tracker -- seq: 14015, time: 2017-06-03 10:42:27.680580, event: waiting for subops from 11, op: osd_op(client.22857445.1:759980367 2.1884fbf0 rbd_data.61b4c6238e1f29.0000000000001c8b [set-alloc-hint object_size 4194304 write_size 4194304,write 3616768~65536] snapc 0=[] ondisk+write e71740) -71> 2017-06-03 10:42:27.680597 7f7172069700 20 osd.84 71740 share_map_peer 0x561465ddda80 already has epoch 71740 -70> 2017-06-03 10:42:27.680599 7f7172069700 1 -- 172.16.3.14:6806/11590 --> 172.16.3.3:6818/14977 -- osd_repop(client.22857445.1:759980367 2.3f0 2:0fdf2118:::rbd_data.61b4c6238e1f29.0000000000001c8b:head v 71740'1378127) v1 -- ?+66267 0x56146bba9600 con 0x561465ddda80 -69> 2017-06-03 10:42:27.680613 7f7172069700 10 osd.84 pg_epoch: 71740 pg[2.3f0( v 71739'1378126 (71527'1375122,71739'1378126] local-les=71729 n=6271 ec=130 les/c/f 71729/71729/67943 71727/71728/71728) [84,11] r=0 lpr=71728 crt=71739'1378124 lcod 71739'1378125 mlcod 71739'1378125 active+clean NIBBLEWISE] append_log log((71527'1375122,71739'1378126], crt=71739'1378124) [71740'1378127 (71739'1378125) modify 2:0fdf2118:::rbd_data.61b4c6238e1f29.0000000000001c8b:head by client.22857445.1:759980367 2017-06-03 10:42:27.670488] -68> 2017-06-03 10:42:27.680626 7f7172069700 10 osd.84 pg_epoch: 71740 pg[2.3f0( v 71740'1378127 (71527'1375122,71740'1378127] local-les=71729 n=6271 ec=130 les/c/f 71729/71729/67943 71727/71728/71728) [84,11] r=0 lpr=71728 luod=71739'1378126 lua=71739'1378126 crt=71739'1378124 lcod 71739'1378125 mlcod 71739'1378125 active+clean NIBBLEWISE] add_log_entry 71740'1378127 (71739'1378125) modify 2:0fdf2118:::rbd_data.61b4c6238e1f29.0000000000001c8b:head by client.22857445.1:759980367 2017-06-03 10:42:27.670488 -67> 2017-06-03 10:42:27.680636 7f7172069700 10 osd.84 pg_epoch: 71740 pg[2.3f0( v 71740'1378127 (71527'1375122,71740'1378127] local-les=71729 n=6271 ec=130 les/c/f 71729/71729/67943 71727/71728/71728) [84,11] r=0 lpr=71728 luod=71739'1378126 lua=71739'1378126 crt=71739'1378125 lcod 71739'1378125 mlcod 71739'1378125 active+clean NIBBLEWISE] append_log: trimming to 71739'1378125 entries 71739'1378125 (71722'1378109) modify 2:0fdf2118:::rbd_data.61b4c6238e1f29.0000000000001c8b:head by client.22857445.1:759960105 2017-06-03 10:41:29.573977 -66> 2017-06-03 10:42:27.680653 7f7172069700 5 write_log with: dirty_to: 0'0, dirty_from: 4294967295'18446744073709551615, dirty_divergent_priors: false, divergent_priors: 0, writeout_from: 71740'1378127, trimmed: -65> 2017-06-03 10:42:27.680711 7f7172069700 5 -- op tracker -- seq: 14015, time: 2017-06-03 10:42:27.680711, event: commit_queued_for_journal_write, op: osd_op(client.22857445.1:759980367 2.1884fbf0 rbd_data.61b4c6238e1f29.0000000000001c8b [set-alloc-hint object_size 4194304 write_size 4194304,write 3616768~65536] snapc 0=[] ondisk+write e71740) -64> 2017-06-03 10:42:27.680722 7f7172069700 10 osd.84 pg_epoch: 71740 pg[2.3f0( v 71740'1378127 (71527'1375122,71740'1378127] local-les=71729 n=6271 ec=130 les/c/f 71729/71729/67943 71727/71728/71728) [84,11] r=0 lpr=71728 luod=71739'1378126 lua=71739'1378126 crt=71739'1378125 lcod 71739'1378125 mlcod 71739'1378125 active+clean NIBBLEWISE] eval_repop repgather(0x56144dec5c20 71740'1378127 rep_tid=1391 committed?=0 applied?=0) wants=d -63> 2017-06-03 10:42:27.680731 7f7172069700 10 osd.84 71740 dequeue_op 0x56146d3bf700 finish -62> 2017-06-03 10:42:27.680732 7f71891ef700 5 -- op tracker -- seq: 14015, time: 2017-06-03 10:42:27.680731, event: write_thread_in_journal_buffer, op: osd_op(client.22857445.1:759980367 2.1884fbf0 rbd_data.61b4c6238e1f29.0000000000001c8b [set-alloc-hint object_size 4194304 write_size 4194304,write 3616768~65536] snapc 0=[] ondisk+write e71740) -61> 2017-06-03 10:42:27.680968 7f71889ee700 5 -- op tracker -- seq: 14015, time: 2017-06-03 10:42:27.680968, event: journaled_completion_queued, op: osd_op(client.22857445.1:759980367 2.1884fbf0 rbd_data.61b4c6238e1f29.0000000000001c8b [set-alloc-hint object_size 4194304 write_size 4194304,write 3616768~65536] snapc 0=[] ondisk+write e71740) -60> 2017-06-03 10:42:27.680983 7f71869ea700 10 osd.84 pg_epoch: 71740 pg[2.3f0( v 71740'1378127 (71527'1375122,71740'1378127] local-les=71729 n=6271 ec=130 les/c/f 71729/71729/67943 71727/71728/71728) [84,11] r=0 lpr=71728 luod=71739'1378126 lua=71739'1378126 crt=71739'1378125 lcod 71739'1378125 mlcod 71739'1378125 active+clean NIBBLEWISE] op_commit: 1391 -59> 2017-06-03 10:42:27.680998 7f71869ea700 5 -- op tracker -- seq: 14015, time: 2017-06-03 10:42:27.680997, event: op_commit, op: osd_op(client.22857445.1:759980367 2.1884fbf0 rbd_data.61b4c6238e1f29.0000000000001c8b [set-alloc-hint object_size 4194304 write_size 4194304,write 3616768~65536] snapc 0=[] ondisk+write e71740) -58> 2017-06-03 10:42:27.681335 7f71861e9700 10 osd.84 pg_epoch: 71740 pg[2.3f0( v 71740'1378127 (71527'1375122,71740'1378127] local-les=71729 n=6271 ec=130 les/c/f 71729/71729/67943 71727/71728/71728) [84,11] r=0 lpr=71728 luod=71739'1378126 lua=71739'1378126 crt=71739'1378125 lcod 71739'1378125 mlcod 71739'1378125 active+clean NIBBLEWISE] op_applied: 1391 -57> 2017-06-03 10:42:27.681353 7f71861e9700 5 -- op tracker -- seq: 14015, time: 2017-06-03 10:42:27.681353, event: op_applied, op: osd_op(client.22857445.1:759980367 2.1884fbf0 rbd_data.61b4c6238e1f29.0000000000001c8b [set-alloc-hint object_size 4194304 write_size 4194304,write 3616768~65536] snapc 0=[] ondisk+write e71740) -56> 2017-06-03 10:42:27.681364 7f71861e9700 10 osd.84 pg_epoch: 71740 pg[2.3f0( v 71740'1378127 (71527'1375122,71740'1378127] local-les=71729 n=6271 ec=130 les/c/f 71729/71729/67943 71727/71728/71728) [84,11] r=0 lpr=71728 luod=71739'1378126 lua=71739'1378126 crt=71739'1378125 lcod 71739'1378125 mlcod 71739'1378125 active+clean NIBBLEWISE] op_applied version 71740'1378127 -55> 2017-06-03 10:42:27.683398 7f7160fd0700 1 -- 172.16.3.14:6806/11590 <== osd.11 172.16.3.3:6818/14977 147 ==== osd_repop_reply(client.22857445.1:759980367 2.3f0) v1 ==== 83+0+0 (3329426065 0 0) 0x56146914d180 con 0x561465ddda80 -54> 2017-06-03 10:42:27.683413 7f7160fd0700 10 osd.84 71740 handle_replica_op osd_repop_reply(client.22857445.1:759980367 2.3f0) v1 epoch 71740 -53> 2017-06-03 10:42:27.683417 7f7160fd0700 20 osd.84 71740 should_share_map osd.11 172.16.3.3:6818/14977 71740 -52> 2017-06-03 10:42:27.683421 7f7160fd0700 15 osd.84 71740 enqueue_op 0x56146738aa00 prio 196 cost 0 latency 0.000046 osd_repop_reply(client.22857445.1:759980367 2.3f0) v1 -51> 2017-06-03 10:42:27.683424 7f7160fd0700 5 -- op tracker -- seq: 14016, time: 2017-06-03 10:42:27.683424, event: queued_for_pg, op: osd_repop_reply(client.22857445.1:759980367 2.3f0) -50> 2017-06-03 10:42:27.683443 7f716f864700 10 osd.84 71740 dequeue_op 0x56146738aa00 prio 196 cost 0 latency 0.000067 osd_repop_reply(client.22857445.1:759980367 2.3f0) v1 pg pg[2.3f0( v 71740'1378127 (71527'1375122,71740'1378127] local-les=71729 n=6271 ec=130 les/c/f 71729/71729/67943 71727/71728/71728) [84,11] r=0 lpr=71728 luod=71739'1378126 crt=71739'1378125 lcod 71739'1378125 mlcod 71739'1378125 active+clean NIBBLEWISE] -49> 2017-06-03 10:42:27.683457 7f716f864700 5 -- op tracker -- seq: 14016, time: 2017-06-03 10:42:27.683457, event: reached_pg, op: osd_repop_reply(client.22857445.1:759980367 2.3f0) -48> 2017-06-03 10:42:27.683462 7f716f864700 10 osd.84 pg_epoch: 71740 pg[2.3f0( v 71740'1378127 (71527'1375122,71740'1378127] local-les=71729 n=6271 ec=130 les/c/f 71729/71729/67943 71727/71728/71728) [84,11] r=0 lpr=71728 luod=71739'1378126 crt=71739'1378125 lcod 71739'1378125 mlcod 71739'1378125 active+clean NIBBLEWISE] handle_message: 0x56146738aa00 -47> 2017-06-03 10:42:27.683471 7f716f864700 5 -- op tracker -- seq: 14016, time: 2017-06-03 10:42:27.683470, event: started, op: osd_repop_reply(client.22857445.1:759980367 2.3f0 ondisk, result = 0) -46> 2017-06-03 10:42:27.683473 7f716f864700 7 osd.84 pg_epoch: 71740 pg[2.3f0( v 71740'1378127 (71527'1375122,71740'1378127] local-les=71729 n=6271 ec=130 les/c/f 71729/71729/67943 71727/71728/71728) [84,11] r=0 lpr=71728 luod=71739'1378126 crt=71739'1378125 lcod 71739'1378125 mlcod 71739'1378125 active+clean NIBBLEWISE] sub_op_modify_reply: tid 1391 op ack_type 4 from 11 -45> 2017-06-03 10:42:27.683481 7f716f864700 5 -- op tracker -- seq: 14015, time: 2017-06-03 10:42:27.683481, event: sub_op_commit_rec from 11, op: osd_op(client.22857445.1:759980367 2.1884fbf0 rbd_data.61b4c6238e1f29.0000000000001c8b [set-alloc-hint object_size 4194304 write_size 4194304,write 3616768~65536] snapc 0=[] ondisk+write e71740) -44> 2017-06-03 10:42:27.683490 7f716f864700 10 osd.84 pg_epoch: 71740 pg[2.3f0( v 71740'1378127 (71527'1375122,71740'1378127] local-les=71729 n=6271 ec=130 les/c/f 71729/71729/67943 71727/71728/71728) [84,11] r=0 lpr=71728 luod=71739'1378126 crt=71739'1378125 lcod 71739'1378125 mlcod 71739'1378125 active+clean NIBBLEWISE] repop_all_applied: repop tid 1391 all applied -43> 2017-06-03 10:42:27.683496 7f716f864700 10 osd.84 pg_epoch: 71740 pg[2.3f0( v 71740'1378127 (71527'1375122,71740'1378127] local-les=71729 n=6271 ec=130 les/c/f 71729/71729/67943 71727/71728/71728) [84,11] r=0 lpr=71728 luod=71739'1378126 crt=71739'1378125 lcod 71739'1378125 mlcod 71739'1378125 active+clean NIBBLEWISE] eval_repop repgather(0x56144dec5c20 71740'1378127 rep_tid=1391 committed?=0 applied?=1) wants=d -42> 2017-06-03 10:42:27.683502 7f716f864700 10 osd.84 pg_epoch: 71740 pg[2.3f0( v 71740'1378127 (71527'1375122,71740'1378127] local-les=71729 n=6271 ec=130 les/c/f 71729/71729/67943 71727/71728/71728) [84,11] r=0 lpr=71728 luod=71739'1378126 crt=71739'1378125 lcod 71739'1378125 mlcod 71739'1378125 active+clean NIBBLEWISE] applied: repgather(0x56144dec5c20 71740'1378127 rep_tid=1391 committed?=0 applied?=1) -41> 2017-06-03 10:42:27.683508 7f716f864700 10 osd.84 pg_epoch: 71740 pg[2.3f0( v 71740'1378127 (71527'1375122,71740'1378127] local-les=71729 n=6271 ec=130 les/c/f 71729/71729/67943 71727/71728/71728) [84,11] r=0 lpr=71728 luod=71739'1378126 crt=71739'1378125 lcod 71739'1378125 mlcod 71739'1378125 active+clean NIBBLEWISE] repop_all_committed: repop tid 1391 all committed -40> 2017-06-03 10:42:27.683513 7f716f864700 10 osd.84 pg_epoch: 71740 pg[2.3f0( v 71740'1378127 (71527'1375122,71740'1378127] local-les=71729 n=6271 ec=130 les/c/f 71729/71729/67943 71727/71728/71728) [84,11] r=0 lpr=71728 crt=71739'1378125 lcod 71739'1378126 mlcod 71739'1378125 active+clean NIBBLEWISE] eval_repop repgather(0x56144dec5c20 71740'1378127 rep_tid=1391 committed?=1 applied?=1) wants=d -39> 2017-06-03 10:42:27.683519 7f716f864700 10 osd.84 pg_epoch: 71740 pg[2.3f0( v 71740'1378127 (71527'1375122,71740'1378127] local-les=71729 n=6271 ec=130 les/c/f 71729/71729/67943 71727/71728/71728) [84,11] r=0 lpr=71728 crt=71739'1378125 lcod 71739'1378126 mlcod 71739'1378125 active+clean NIBBLEWISE] commit: repgather(0x56144dec5c20 71740'1378127 rep_tid=1391 committed?=1 applied?=1) -38> 2017-06-03 10:42:27.683525 7f716f864700 15 osd.84 pg_epoch: 71740 pg[2.3f0( v 71740'1378127 (71527'1375122,71740'1378127] local-les=71729 n=6271 ec=130 les/c/f 71729/71729/67943 71727/71728/71728) [84,11] r=0 lpr=71728 crt=71739'1378125 lcod 71739'1378126 mlcod 71739'1378125 active+clean NIBBLEWISE] log_op_stats osd_op(client.22857445.1:759980367 2.1884fbf0 rbd_data.61b4c6238e1f29.0000000000001c8b [set-alloc-hint object_size 4194304 write_size 4194304,write 3616768~65536] snapc 0=[] ondisk+write e71740) v4 inb 65536 outb 0 rlat 0.003504 lat 0.003521 -37> 2017-06-03 10:42:27.683533 7f716f864700 10 osd.84 pg_epoch: 71740 pg[2.3f0( v 71740'1378127 (71527'1375122,71740'1378127] local-les=71729 n=6271 ec=130 les/c/f 71729/71729/67943 71727/71728/71728) [84,11] r=0 lpr=71728 crt=71739'1378125 lcod 71739'1378126 mlcod 71739'1378125 active+clean NIBBLEWISE] sending commit on osd_op(client.22857445.1:759980367 2.1884fbf0 rbd_data.61b4c6238e1f29.0000000000001c8b [set-alloc-hint object_size 4194304 write_size 4194304,write 3616768~65536] snapc 0=[] ondisk+write e71740) v4 0x561469257e40 -36> 2017-06-03 10:42:27.683540 7f716f864700 1 -- 172.16.2.114:6806/11590 --> 172.16.2.212:0/2238053329 -- osd_op_reply(759980367 rbd_data.61b4c6238e1f29.0000000000001c8b [set-alloc-hint object_size 4194304 write_size 4194304,write 3616768~65536] v71740'1378127 uv6406885 ondisk = 0) v7 -- ?+0 0x561469257e40 con 0x5614678fb480 -35> 2017-06-03 10:42:27.683553 7f716f864700 5 -- op tracker -- seq: 14015, time: 2017-06-03 10:42:27.683553, event: commit_sent, op: osd_op(client.22857445.1:759980367 2.1884fbf0 rbd_data.61b4c6238e1f29.0000000000001c8b [set-alloc-hint object_size 4194304 write_size 4194304,write 3616768~65536] snapc 0=[] ondisk+write e71740) -34> 2017-06-03 10:42:27.683561 7f716f864700 10 osd.84 pg_epoch: 71740 pg[2.3f0( v 71740'1378127 (71527'1375122,71740'1378127] local-les=71729 n=6271 ec=130 les/c/f 71729/71729/67943 71727/71728/71728) [84,11] r=0 lpr=71728 crt=71739'1378125 lcod 71739'1378126 mlcod 71739'1378125 active+clean NIBBLEWISE] applied: repgather(0x56144dec5c20 71740'1378127 rep_tid=1391 committed?=1 applied?=1) -33> 2017-06-03 10:42:27.683573 7f716f864700 15 osd.84 pg_epoch: 71740 pg[2.3f0( v 71740'1378127 (71527'1375122,71740'1378127] local-les=71729 n=6271 ec=130 les/c/f 71729/71729/67943 71727/71728/71728) [84,11] r=0 lpr=71728 crt=71739'1378125 lcod 71739'1378126 mlcod 71739'1378125 active+clean NIBBLEWISE] publish_stats_to_osd 71740:1137643 -32> 2017-06-03 10:42:27.683582 7f716f864700 15 osd.84 pg_epoch: 71740 pg[2.3f0( v 71740'1378127 (71527'1375122,71740'1378127] local-les=71729 n=6271 ec=130 les/c/f 71729/71729/67943 71727/71728/71728) [84,11] r=0 lpr=71728 crt=71739'1378125 lcod 71739'1378126 mlcod 71739'1378126 active+clean NIBBLEWISE] do_osd_op_effects client.22857445 con 0x5614678fb480 -31> 2017-06-03 10:42:27.683592 7f716f864700 10 osd.84 pg_epoch: 71740 pg[2.3f0( v 71740'1378127 (71527'1375122,71740'1378127] local-les=71729 n=6271 ec=130 les/c/f 71729/71729/67943 71727/71728/71728) [84,11] r=0 lpr=71728 crt=71739'1378125 lcod 71739'1378126 mlcod 71739'1378126 active+clean NIBBLEWISE] removing repgather(0x56144dec5c20 71740'1378127 rep_tid=1391 committed?=1 applied?=1) -30> 2017-06-03 10:42:27.683601 7f716f864700 20 osd.84 pg_epoch: 71740 pg[2.3f0( v 71740'1378127 (71527'1375122,71740'1378127] local-les=71729 n=6271 ec=130 les/c/f 71729/71729/67943 71727/71728/71728) [84,11] r=0 lpr=71728 crt=71739'1378125 lcod 71739'1378126 mlcod 71739'1378126 active+clean NIBBLEWISE] q front is repgather(0x56144dec5c20 71740'1378127 rep_tid=1391 committed?=1 applied?=1) -29> 2017-06-03 10:42:27.683610 7f716f864700 20 osd.84 pg_epoch: 71740 pg[2.3f0( v 71740'1378127 (71527'1375122,71740'1378127] local-les=71729 n=6271 ec=130 les/c/f 71729/71729/67943 71727/71728/71728) [84,11] r=0 lpr=71728 crt=71739'1378125 lcod 71739'1378126 mlcod 71739'1378126 active+clean NIBBLEWISE] remove_repop repgather(0x56144dec5c20 71740'1378127 rep_tid=1391 committed?=1 applied?=1) -28> 2017-06-03 10:42:27.683623 7f716f864700 15 osd.84 pg_epoch: 71740 pg[2.3f0( v 71740'1378127 (71527'1375122,71740'1378127] local-les=71729 n=6271 ec=130 les/c/f 71729/71729/67943 71727/71728/71728) [84,11] r=0 lpr=71728 crt=71739'1378125 lcod 71739'1378126 mlcod 71739'1378126 active+clean NIBBLEWISE] requeue_ops -27> 2017-06-03 10:42:27.683634 7f716f864700 5 -- op tracker -- seq: 14015, time: 2017-06-03 10:42:27.683633, event: done, op: osd_op(client.22857445.1:759980367 2.1884fbf0 rbd_data.61b4c6238e1f29.0000000000001c8b [set-alloc-hint object_size 4194304 write_size 4194304,write 3616768~65536] snapc 0=[] ondisk+write e71740) -26> 2017-06-03 10:42:27.683654 7f716f864700 10 osd.84 71740 dequeue_op 0x56146738aa00 finish -25> 2017-06-03 10:42:27.683658 7f716f864700 5 -- op tracker -- seq: 14016, time: 2017-06-03 10:42:27.683657, event: done, op: osd_repop_reply(client.22857445.1:759980367 2.3f0 ondisk, result = 0) -24> 2017-06-03 10:42:27.690237 7f7140a75700 1 -- 172.16.2.114:6807/11590 <== osd.35 172.16.2.105:0/4630 74 ==== osd_ping(ping e71740 stamp 2017-06-03 10:42:27.691894) v2 ==== 47+0+0 (1336756362 0 0) 0x56146804e800 con 0x561467283900 -23> 2017-06-03 10:42:27.690252 7f7140a75700 1 -- 172.16.2.114:6807/11590 --> 172.16.2.105:0/4630 -- osd_ping(ping_reply e71740 stamp 2017-06-03 10:42:27.691894) v2 -- ?+0 0x561468151e00 con 0x561467283900 -22> 2017-06-03 10:42:27.690265 7f7140a75700 20 osd.84 71740 share_map_peer 0x561442ea1000 already has epoch 71740 -21> 2017-06-03 10:42:27.690258 7f7140b76700 1 -- 172.16.3.14:6807/11590 <== osd.35 172.16.2.105:0/4630 74 ==== osd_ping(ping e71740 stamp 2017-06-03 10:42:27.691894) v2 ==== 47+0+0 (1336756362 0 0) 0x56146804ee00 con 0x5614678f8300 -20> 2017-06-03 10:42:27.690275 7f7140b76700 1 -- 172.16.3.14:6807/11590 --> 172.16.2.105:0/4630 -- osd_ping(ping_reply e71740 stamp 2017-06-03 10:42:27.691894) v2 -- ?+0 0x56146d350000 con 0x5614678f8300 -19> 2017-06-03 10:42:27.690287 7f7140b76700 20 osd.84 71740 share_map_peer 0x561442ea1000 already has epoch 71740 -18> 2017-06-03 10:42:27.713217 7f713ee59700 1 -- 172.16.3.14:6807/11590 <== osd.38 172.16.2.105:0/4910 78 ==== osd_ping(ping e71740 stamp 2017-06-03 10:42:27.714244) v2 ==== 47+0+0 (140188268 0 0) 0x561468170600 con 0x5614678f8d80 -17> 2017-06-03 10:42:27.713238 7f713ee59700 1 -- 172.16.3.14:6807/11590 --> 172.16.2.105:0/4910 -- osd_ping(ping_reply e71740 stamp 2017-06-03 10:42:27.714244) v2 -- ?+0 0x561469cb1c00 con 0x5614678f8d80 -16> 2017-06-03 10:42:27.713254 7f713ee59700 20 osd.84 71740 share_map_peer 0x56146631cb80 already has epoch 71740 -15> 2017-06-03 10:42:27.713463 7f713ef5a700 1 -- 172.16.2.114:6807/11590 <== osd.38 172.16.2.105:0/4910 78 ==== osd_ping(ping e71740 stamp 2017-06-03 10:42:27.714244) v2 ==== 47+0+0 (140188268 0 0) 0x561468170800 con 0x5614678f9500 -14> 2017-06-03 10:42:27.713486 7f713ef5a700 1 -- 172.16.2.114:6807/11590 --> 172.16.2.105:0/4910 -- osd_ping(ping_reply e71740 stamp 2017-06-03 10:42:27.714244) v2 -- ?+0 0x561469cb1200 con 0x5614678f9500 -13> 2017-06-03 10:42:27.713498 7f713ef5a700 20 osd.84 71740 share_map_peer 0x56146631cb80 already has epoch 71740 -12> 2017-06-03 10:42:27.722765 7f7148bf6700 1 -- 172.16.3.14:6807/11590 <== osd.15 172.16.3.3:0/5388 79 ==== osd_ping(ping e71740 stamp 2017-06-03 10:42:27.713200) v2 ==== 47+0+0 (1176829029 0 0) 0x5614671ed000 con 0x56146824cd80 -11> 2017-06-03 10:42:27.722794 7f7148bf6700 1 -- 172.16.3.14:6807/11590 --> 172.16.3.3:0/5388 -- osd_ping(ping_reply e71740 stamp 2017-06-03 10:42:27.713200) v2 -- ?+0 0x56146cc2bc00 con 0x56146824cd80 -10> 2017-06-03 10:42:27.722808 7f7148bf6700 20 osd.84 71740 share_map_peer 0x561466029a80 already has epoch 71740 -9> 2017-06-03 10:42:27.722960 7f7148af5700 1 -- 172.16.2.114:6807/11590 <== osd.15 172.16.3.3:0/5388 79 ==== osd_ping(ping e71740 stamp 2017-06-03 10:42:27.713200) v2 ==== 47+0+0 (1176829029 0 0) 0x5614671ed200 con 0x56146824d980 -8> 2017-06-03 10:42:27.722977 7f7148af5700 1 -- 172.16.2.114:6807/11590 --> 172.16.3.3:0/5388 -- osd_ping(ping_reply e71740 stamp 2017-06-03 10:42:27.713200) v2 -- ?+0 0x56146645ae00 con 0x56146824d980 -7> 2017-06-03 10:42:27.722991 7f7148af5700 20 osd.84 71740 share_map_peer 0x561466029a80 already has epoch 71740 -6> 2017-06-03 10:42:27.727100 7f713fa65700 1 -- 172.16.2.114:6807/11590 <== osd.57 172.16.2.107:0/3819 74 ==== osd_ping(ping e71740 stamp 2017-06-03 10:42:27.727036) v2 ==== 47+0+0 (2829453915 0 0) 0x56146751c600 con 0x5614678f9080 -5> 2017-06-03 10:42:27.727114 7f713fa65700 1 -- 172.16.2.114:6807/11590 --> 172.16.2.107:0/3819 -- osd_ping(ping_reply e71740 stamp 2017-06-03 10:42:27.727036) v2 -- ?+0 0x56146986f600 con 0x5614678f9080 -4> 2017-06-03 10:42:27.727126 7f713fa65700 20 osd.84 71740 share_map_peer 0x561466746300 already has epoch 71740 -3> 2017-06-03 10:42:27.727895 7f713fb66700 1 -- 172.16.3.14:6807/11590 <== osd.57 172.16.2.107:0/3819 74 ==== osd_ping(ping e71740 stamp 2017-06-03 10:42:27.727036) v2 ==== 47+0+0 (2829453915 0 0) 0x56146751c000 con 0x5614678f8900 -2> 2017-06-03 10:42:27.727914 7f713fb66700 1 -- 172.16.3.14:6807/11590 --> 172.16.2.107:0/3819 -- osd_ping(ping_reply e71740 stamp 2017-06-03 10:42:27.727036) v2 -- ?+0 0x56146c891000 con 0x5614678f8900 -1> 2017-06-03 10:42:27.727927 7f713fb66700 20 osd.84 71740 share_map_peer 0x561466746300 already has epoch 71740 0> 2017-06-03 10:42:27.743760 7f716e862700 -1 *** Caught signal (Aborted) ** in thread 7f716e862700 thread_name:tp_osd_recov ceph version 10.2.7 (50e863e0f4bc8f4b9e31156de690d765af245185) 1: (()+0x9564a7) [0x5614333db4a7] 2: (()+0xf890) [0x7f719633e890] 3: (gsignal()+0x37) [0x7f7194385067] 4: (abort()+0x148) [0x7f7194386448] 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x256) [0x5614334e13d6] 6: (ReplicatedPG::recover_replicas(int, ThreadPool::TPHandle&)+0x62f) [0x561432f6e0ff] 7: (ReplicatedPG::start_recovery_ops(int, ThreadPool::TPHandle&, int*)+0xa8a) [0x561432fa178a] 8: (OSD::do_recovery(PG*, ThreadPool::TPHandle&)+0x36d) [0x561432e1abbd] 9: (ThreadPool::WorkQueue<PG>::_void_process(void*, ThreadPool::TPHandle&)+0x1d) [0x561432e6588d] 10: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa9f) [0x5614334d1e3f] 11: (ThreadPool::WorkThread::entry()+0x10) [0x5614334d2d70] 12: (()+0x8064) [0x7f7196337064] 13: (clone()+0x6d) [0x7f719443862d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com