OSD crash (hammer): osd/ReplicatedPG.cc: 7477: FAILED assert(repop_queue.front() == repop)

"Ricardo J. Barberis" <ricardo@xxxxxxxxxxxxx> · Fri, 9 Jun 2017 16:18:36 -0300

Hi list,

A few days ago we had some problems with our ceph cluster, and now we have some
OSDs crashing on start with messages like this right before crashing:

2017-06-09 15:35:02.226430 7fb46d9e4700 -1 log_channel(cluster) log [ERR] : trim_object Snap 4aae0 not in clones

I can start those OSDs if i set 'osd recovery max active = 0', but then the
PGs in those OSDs stay in stuck degraded/unclean state.

Now, one of the problems we faced was missing clones, which we solved with
ceph-objectstore-tool's remove-clone-metadata option, but it doesn't seem to
work in this case (PG and object taken from the sample log posted below):

# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-199 --journal-path /var/lib/ceph/osd/ceph-199/journal --pgid 3.1cd 
rbd_data.6e30d2d518fee6d.000000000000008f remove-clone-metadata $((0x4aae0))
Clone 4aae0 not present

Any hints how can we debug/fix this problem?

Thanks in advance.

Our cluster:
Hammer (0.94.10)
5 mons
576 osds

Our clients:
Openstack, some Hammer (0.94.10), some Giant (0.87.2)
(We're in the process of upgrading everithing to Hhammer and then to Jewel)

Our ceph.conf:

[global]
fsid = 5ecc1509-4344-4087-9a1c-ac68fb085a75
mon initial members = ceph17, ceph19, ceph20, cephmon01, cephmon02
mon host = 172.17.22.17,172.17.22.19,172.17.22.20,172.17.23.251,172.17.23.252
auth cluster required = cephx
auth service required = cephx
auth client required = cephx
filestore xattr use omap = true
osd pool default size = 2
osd pool default min size = 1
public network = 172.17.0.0/16
cluster network = 172.18.0.0/16
osd pool default pg num = 2048
osd pool default pgp num = 2048
mon osd down out subtree limit = host
rbd default format = 2
osd op threads = 2
osd disk threads = 1
osd max backfills = 2
osd recovery threads = 1
osd recovery max active = 2
osd recovery op priority = 2
[mon]
mon compact on start = true
[osd]
osd crush update on start = false

Here's a sample log, with [...] replacing very long text/lines, but I can
provide full logs if needed:

2017-06-08 16:47:14.519092 7fa61bee4700 -1 log_channel(cluster) log [ERR] : trim_object Snap 4aae0 not in clones
2017-06-08 16:47:16.197600 7fa62ee75700  0 osd.199 pg_epoch: 1779479 pg[3.1cd( v 1779479'105504641 (1779407'105501638,1779479'105504641] local-les=1779478 n=1713 
ec=33 les/c 1779478/1779479 1779477/1779477/1779477) [199,134] r=0 lpr=1779477 luod=1779479'105504639 crt=1779479'105504637 lcod 1779479'105504638 mlcod 
1779479'105504638 active+clean snaptrimq=[4aae0~1,4aae2~1,4aae4~1,4aae6~1,[...],4bc5f~4,4bc64~2,4bc68~2]]  removing repgather(0xf9d09c0 1779479'105504639 
rep_tid=1753 committed?=1 applied?=1 lock=0 op=osd_op(client.215404673.0:6234090 rbd_data.6e30d2d518fee6d.000000000000008f [set-alloc-hint object_size 8388608 
write_size 8388608,write 2101248~4096] 3.56a5e1cd snapc 4a86e=[] ack+ondisk+write+known_if_redirected e1779479) v4)
2017-06-08 16:47:16.197834 7fa62ee75700  0 osd.199 pg_epoch: 1779479 pg[3.1cd( v 1779479'105504641 (1779407'105501638,1779479'105504641] local-les=1779478 n=1713 
ec=33 les/c 1779478/1779479 1779477/1779477/1779477) [199,134] r=0 lpr=1779477 luod=1779479'105504639 crt=1779479'105504637 lcod 1779479'105504638 mlcod 
1779479'105504638 active+clean snaptrimq=[4aae0~1,4aae2~1,4aae4~1,[...],4bc5f~4,4bc64~2,4bc68~2]]    q front is repgather(0x10908a00 0'0 rep_tid=1740 committed?=0 
applied?=0 lock=0)
2017-06-08 16:47:16.267386 7fa62ee75700 -1 osd/ReplicatedPG.cc: In function 'void ReplicatedPG::eval_repop(ReplicatedPG::RepGather*)' thread 7fa62ee75700 time 
2017-06-08 16:47:16.197953
osd/ReplicatedPG.cc: 7477: FAILED assert(repop_queue.front() == repop)

 ceph version 0.94.10 (b1e0532418e4631af01acbc0cedd426f1905f4af)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x85) [0xbdf735]
 2: (ReplicatedPG::eval_repop(ReplicatedPG::RepGather*)+0x12e1) [0x853111]
 3: (ReplicatedPG::repop_all_committed(ReplicatedPG::RepGather*)+0xcc) [0x85341c]
 4: (Context::complete(int)+0x9) [0x6c6329]
 5: (ReplicatedBackend::op_commit(ReplicatedBackend::InProgressOp*)+0x1dc) [0xa11b4c]
 6: (Context::complete(int)+0x9) [0x6c6329]
 7: (ReplicatedPG::BlessedContext::finish(int)+0x94) [0x8ad2d4]
 8: (Context::complete(int)+0x9) [0x6c6329]
 9: (Finisher::finisher_thread_entry()+0x168) [0xb02168]
 10: (()+0x7dc5) [0x7fa63ce39dc5]
 11: (clone()+0x6d) [0x7fa63b91b76d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- begin dump of recent events ---
-10000> 2017-06-09 15:35:27.341080 7fb4862fb700  5 osd.199 1779827 tick
 -9999> 2017-06-09 15:35:27.341102 7fb4862fb700 20 osd.199 1779827 scrub_time_permit should run between 0 - 24 now 15 = yes
 -9998> 2017-06-09 15:35:27.341121 7fb4862fb700 20 osd.199 1779827 scrub_load_below_threshold loadavg 1.35 >= max 0.5 = no, load too high
 -9997> 2017-06-09 15:35:27.341130 7fb4862fb700 20 osd.199 1779827 sched_scrub load_is_low=0
[...]
    -5> 2017-06-09 15:35:30.894979 7fb480edd700 15 osd.199 pg_epoch: 1779827 pg[3.92f( v 1779827'125690903 (1779818'125687817,1779827'125690903] local-les=1779826 
n=3186 ec=33 les/c 1779826/1779827 1779825/1779825/1779825) [199,564] r=0 lpr=1779825 lua=1779827'125690899 crt=1779827'125690898 lcod 1779827'125690902 mlcod 
1779827'125690898 active+clean] publish_stats_to_osd 1779827:186528651
    -4> 2017-06-09 15:35:30.894986 7fb480edd700 10 osd.199 pg_epoch: 1779827 pg[3.92f( v 1779827'125690903 (1779818'125687817,1779827'125690903] local-les=1779826 
n=3186 ec=33 les/c 1779826/1779827 1779825/1779825/1779825) [199,564] r=0 lpr=1779825 lua=1779827'125690899 crt=1779827'125690898 lcod 1779827'125690902 mlcod 
1779827'125690898 active+clean]  sending commit on repgather(0xf05b600 1779827'125690903 rep_tid=1162 committed?=1 applied?=0 lock=0 
op=osd_op(client.260852320.0:1973188 rbd_data.3d86d73dbe422e.00000000000000ef [set-alloc-hint object_size 8388608 write_size 8388608,write 3444736~4096] 3.ef04392f 
snapc 4abd1=[] ack+ondisk+write+known_if_redirected e1779827) v4) 0xf918c00
    -3> 2017-06-09 15:35:30.895055 7fb480edd700  1 -- 172.17.22.31:6800/209017 --> 172.17.21.25:0/1016405 -- osd_op_reply(1973188 
rbd_data.3d86d73dbe422e.00000000000000ef [set-alloc-hint object_size 8388608 write_size 8388608,write 3444736~4096] v1779827'125690903 uv125690903 ondisk = 0) 
v6 -- ?+0 0xf918c00 con 0xee5f080
    -2> 2017-06-09 15:35:30.895066 7fb480edd700  5 -- op tracker -- seq: 3953, time: 2017-06-09 15:35:30.895066, event: commit_sent, op: 
osd_op(client.260852320.0:1973188 rbd_data.3d86d73dbe422e.00000000000000ef [set-alloc-hint object_size 8388608 write_size 8388608,write 3444736~4096] 3.ef04392f 
snapc 4abd1=[] ack+ondisk+write+known_if_redirected e1779827)
    -1> 2017-06-09 15:35:30.906430 7fb46d9e4700 10 osd.199 pg_epoch: 1779827 pg[3.506( v 1779827'139421210 (1779816'139418118,1779827'139421210] local-les=1779826 
n=3131 ec=33 les/c 1779826/1779827 1779825/1779825/1779825) [199,171] r=0 lpr=1779825 crt=1779827'139421199 lcod 1779827'139421209 mlcod 1779827'139421209 
active+clean+scrubbing+deep] be_deep_scrub 3/6e748506/rbd_data.9f52fa7790c71b.0000000000000122/head seed 4294967295
     0> 2017-06-09 15:35:30.919706 7fb4816de700 -1 *** Caught signal (Aborted) **
 in thread 7fb4816de700

 ceph version 0.94.10 (b1e0532418e4631af01acbc0cedd426f1905f4af)
 1: /usr/bin/ceph-osd() [0xadc182]
 2: (()+0xf370) [0x7fb48ec9b370]
 3: (gsignal()+0x37) [0x7fb48d6b31d7]
 4: (abort()+0x148) [0x7fb48d6b48c8]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7fb48dfb79d5]
 6: (()+0x5e946) [0x7fb48dfb5946]
 7: (()+0x5e973) [0x7fb48dfb5973]
 8: (()+0x5eb93) [0x7fb48dfb5b93]
 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x27a) [0xbdf92a]
 10: (ReplicatedPG::eval_repop(ReplicatedPG::RepGather*)+0x12e1) [0x853111]
 11: (ReplicatedPG::repop_all_applied(ReplicatedPG::RepGather*)+0x195) [0x8532e5]
 12: (Context::complete(int)+0x9) [0x6c6329]
 13: (ReplicatedBackend::op_applied(ReplicatedBackend::InProgressOp*)+0x21c) [0xa118dc]
 14: (Context::complete(int)+0x9) [0x6c6329]
 15: (ReplicatedPG::BlessedContext::finish(int)+0x94) [0x8ad2d4]
 16: (Context::complete(int)+0x9) [0x6c6329]
 17: (void finish_contexts<Context>(CephContext*, std::list<Context*, std::allocator<Context*> >&, int)+0xac) [0x709e3c]
 18: (C_ContextsBase<Context, Context>::complete(int)+0x9) [0x6c6c19]
 19: (Finisher::finisher_thread_entry()+0x168) [0xb02168]
 20: (()+0x7dc5) [0x7fb48ec93dc5]
 21: (clone()+0x6d) [0x7fb48d77576d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- logging levels ---
   0/ 5 none
   0/ 1 lockdep
   0/ 1 context
   1/ 1 crush
   1/ 5 mds
   1/ 5 mds_balancer
   1/ 5 mds_locker
   1/ 5 mds_log
   1/ 5 mds_log_expire
   1/ 5 mds_migrator
   0/ 1 buffer
   0/ 1 timer
   0/ 1 filer
   0/ 1 striper
   0/ 1 objecter
   0/ 5 rados
   0/ 5 rbd
   0/ 5 rbd_replay
   0/ 5 journaler
   0/ 5 objectcacher
   0/ 5 client
   0/20 osd
   0/ 5 optracker
   0/ 5 objclass
   1/ 3 filestore
   1/ 3 keyvaluestore
   1/ 3 journal
   0/ 5 ms
   1/ 5 mon
   0/10 monc
   1/ 5 paxos
   0/ 5 tp
   1/ 5 auth
   1/ 5 crypto
   1/ 1 finisher
   1/ 5 heartbeatmap
   1/ 5 perfcounter
   1/ 5 rgw
   1/10 civetweb
   1/ 5 javaclient
   1/ 5 asok
   1/ 1 throttle
   0/ 0 refs
   1/ 5 xio
  -2/-2 (syslog threshold)
  -1/-1 (stderr threshold)
  max_recent     10000
  max_new         1000
  log_file /var/log/ceph/ceph-osd.199.log
--- end dump of recent events ---

Cheers,
-- 
Ricardo J. Barberis
Usuario Linux Nº 250625: http://counter.li.org/
Usuario LFS Nº 5121: http://www.linuxfromscratch.org/
Senior SysAdmin / IT Architect - www.DonWeb.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com