bluestore - OSD booting issue continuosly

nokia ceph <nokiacephusers@xxxxxxxxx> · Wed, 5 Apr 2017 15:45:47 +0530

Hello,
Env:- 11.2.0
         bluestore, EC 4+1 , RHEL7.2

We are facing one OSD's booting again and again which caused the cluster crazy :( .  As you can see one PG got in inconsistent state while we tried to repair that partular PG, as its primary OSD's went down. After some time we found some trace from mon logs that the OSD boot continuously 

# ceph pg stat

v1109889: 8192 pgs: 1 active+clean+inconsistent, 8191
active+clean; 31041 GB data, 42473 GB used, 1177 TB / 1218 TB avail; 411 kB/s
rd, 28102 kB/s wr, 322 op/s

# ceph health detail | grep
'inconsistent'

HEALTH_ERR 1 pgs inconsistent; 5 scrub errors

pg 1.5c4 is active+clean+inconsistent, acting
[258,268,156,14,67]

# ceph pg deep-scrub 1.5c4

instructing pg 1.5c4 on osd.258 to deep-scrub

From ceph-osd.258.log when the problem pops up;

===================================== 

2017-04-04 13:08:08.288728 7f6b34fe6700 -1
log_channel(cluster) log [ERR] : 1.5c4s0 scrub stat mismatch, got 1989/1988
objects, 0/0 clones, 1989/1988 dirty, 0/0

omap, 0/0 pinned, 0/0 hit_set_archive, 0/0 whiteouts,
4113960196/4110943924 bytes, 0/0 hit_set_archive bytes.

2017-04-04 13:08:08.288744 7f6b34fe6700 -1
log_channel(cluster) log [ERR] : 1.5c4s0 scrub 1 missing, 0 inconsistent
objects

2017-04-04 13:08:08.288747 7f6b34fe6700 -1 log_channel(cluster)
log [ERR] : 1.5c4 scrub 5 errors

From mon logs
===========

2017-04-05 07:18:17.665821 7f0fb55b5700  0 log_channel(cluster) log [INF] : pgmap v1134421: 8192 pgs: 1 activating+degraded+inconsistent, 4 activ
ating, 27 active+recovery_wait+degraded, 3 active+recovering+degraded, 8157 active+clean; 31619 GB data, 43163 GB used, 1176 TB / 1218 TB avail;
16429 kB/s rd, 109 MB/s wr, 1208 op/s; 554/87717275 objects degraded (0.001%)
2017-04-05 07:18:18.643905 7f0fb55b5700  1 mon.cn1@0(leader).osd e4701 e4701: 335 osds: 334 up, 335 in
2017-04-05 07:18:18.656143 7f0fb55b5700  0 mon.cn1@0(leader).osd e4701 crush map has features 288531987042664448, adjusting msgr requires
2017-04-05 07:18:18.676477 7f0fb55b5700  0 log_channel(cluster) log [INF] : osdmap e4701: 335 osds: 334 up, 335 in
2017-04-05 07:18:18.731719 7f0fb55b5700  0 log_channel(cluster) log [INF] : pgmap v1134422: 8192 pgs: 1 activating+degraded+inconsistent, 4 activ
ating, 27 active+recovery_wait+degraded, 3 active+recovering+degraded, 8157 active+clean; 31619 GB data, 43163 GB used, 1176 TB / 1218 TB avail;
21108 kB/s rd, 105 MB/s wr, 1165 op/s; 554/87717365 objects degraded (0.001%)
2017-04-05 07:18:19.963217 7f0fb55b5700  1 mon.cn1@0(leader).osd e4702 e4702: 335 osds: 335 up, 335 in
2017-04-05 07:18:19.975002 7f0fb55b5700  0 mon.cn1@0(leader).osd e4702 crush map has features 288531987042664448, adjusting msgr requires
2017-04-05 07:18:19.978696 7f0fb55b5700  0 log_channel(cluster) log [INF] : osd.258 10.139.4.84:6815/27074 boot   --->>>>>>>>>>>>>>>>>>>>>>>>>>
2017-04-05 07:18:19.983349 7f0fb55b5700  0 log_channel(cluster) log [INF] : osdmap e4702: 335 osds: 335 up, 335 in
2017-04-05 07:18:20.079996 7f0fb55b5700  0 log_channel(cluster) log [INF] : pgmap v1134423: 8192 pgs: 1 inconsistent+peering, 15 peering, 23 acti
ve+recovery_wait+degraded, 3 active+recovering+degraded, 8150 active+clean; 31619 GB data, 43163 GB used, 1176 TB / 1218 TB avail; 6888 kB/s rd,
72604 kB/s wr, 654 op/s; 489/87717555 objects degraded (0.001%)
2017-04-05 07:18:20.135081 7f0fb55b5700  0 log_channel(cluster) log [INF] : pgmap v1134424: 8192 pgs: 19 stale+active+clean, 1 inconsistent+peeri
ng, 15 peering, 23 active+recovery_wait+degraded, 3 active+recovering+degraded, 8131 active+clean; 31619 GB data, 43163 GB used, 1176 TB / 1218 T
B avail; 3452 kB/s rd, 97092 kB/s wr, 834 op/s; 489/87717565 objects degraded (0.001%)
2017-04-05 07:18:20.303398 7f0fb55b5700  1 mon.cn1@0(leader).osd e4703 e4703: 335 osds: 335 up, 335 in

Few occurance for boot..

2017-04-05 07:12:40.085065 7f0fb55b5700  0 log_channel(cluster) log [INF] : osd.258 10.139.4.84:6815/23975 boot
2017-04-05 07:13:37.299990 7f0fb55b5700  0 log_channel(cluster) log [INF] : osd.258 10.139.4.84:6815/24334 boot
2017-04-05 07:14:35.413633 7f0fb55b5700  0 log_channel(cluster) log [INF] : osd.258 10.139.4.84:6815/25745 boot
2017-04-05 07:15:30.303761 7f0fb55b5700  0 log_channel(cluster) log [INF] : osd.258 10.139.4.84:6815/26194 boot
2017-04-05 07:16:25.155067 7f0fb55b5700  0 log_channel(cluster) log [INF] : osd.258 10.139.4.84:6815/26561 boot
2017-04-05 07:17:22.538683 7f0fb55b5700  0 log_channel(cluster) log [INF] : osd.258 10.139.4.84:6815/26768 boot
2017-04-05 07:18:19.978696 7f0fb55b5700  0 log_channel(cluster) log [INF] : osd.258 10.139.4.84:6815/27074 boot
2017-04-05 07:19:14.201683 7f0fb55b5700  0 log_channel(cluster) log [INF] : osd.258 10.139.4.84:6815/28517 boot

From OSD.258..
============
==========
  -5299> 2017-04-05 07:18:20.151610 7f7c9ed5e700  1 osd.258 4702 state: booting -> active
 -5298> 2017-04-05 07:18:20.151622 7f7c9ed5e700  0 osd.258 4702 crush map has features 288531987042664448, adjusting msgr requires for osds
 -5297> 2017-04-05 07:18:20.151759 7f7c92d46700  5 osd.258 pg_epoch: 4701 pg[1.894s2( v 4699'7995 lc 4260'7863 (2052'4800,4699'7995] local-les=4699 n=2058 ec=1713 les/c/f 4699/4295/0 4698/4698/4088) [288,191,258,27,125] r=2 lpr=4699 pi=1713-4697/359 crt=4687'7993 lcod 0'0 inactive NOTIFY NIBBLEWISE m=25] exit Started/Stray 2.766060 3 0.000228
 -5296> 2017-04-05 07:18:20.151764 7f7c94549700  5 osd.258 pg_epoch: 4701 pg[1.636s2( v 4683'8066 (2052'4800,4683'8066] local-les=4699 n=2039 ec=1713 les/c/f 4699/4699/0 4698/4698/4088) [292,186,258,28,106] r=2 lpr=4699 pi=1713-4697/354 crt=4683'8065 lcod 0'0 inactive NOTIFY NIBBLEWISE] exit Started/Stray 2.765980 3 0.000292

--< After some time > 

197 ec=1713 les/c/f 4703/4295/0 4702/4702/4702) [258,268,156,14,67] r=0 lpr=4702 pi=4098-4701/372 crt=4700'8152 mlcod 0'0 active+recovery_wait+de
graded+inconsistent m=37] exit Started/Primary/Active/WaitRemoteRecoveryReserved 0.368048 4 0.000183
    -1> 2017-04-05 07:18:21.005621 7f7c97d50700  5 osd.258 pg_epoch: 4703 pg[1.5c4s0( v 4700'8152 lc 0'0 (2052'5000,4700'8152] local-les=4703 n=2
197 ec=1713 les/c/f 4703/4295/0 4702/4702/4702) [258,268,156,14,67] r=0 lpr=4702 pi=4098-4701/372 crt=4700'8152 mlcod 0'0 active+recovery_wait+de
graded+inconsistent m=37] enter Started/Primary/Active/Recovering
     0> 2017-04-05 07:18:21.008127 7f7c8dd3c700 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIS
T/centos7/DIST/centos7/MACHINE_SIZE/huge/release/11.2.0/rpm/el7/BUILD/ceph-11.2.0/src/osd/PrimaryLogPG.cc: In function 'uint64_t PrimaryLogPG::re
cover_primary(uint64_t, ThreadPool::TPHandle&)' thread 7f7c8dd3c700 time 2017-04-05 07:18:21.005761
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/11
.2.0/rpm/el7/BUILD/ceph-11.2.0/src/osd/PrimaryLogPG.cc: 10659: FAILED assert(latest->is_update())

 ceph version 11.2.0 (f223e27eeb35991352ebc1f67423d4ebc252adb7)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x85) [0x7f7cba410b35]
 2: (PrimaryLogPG::recover_primary(unsigned long, ThreadPool::TPHandle&)+0x4b5) [0x7f7cb9f32eb5]
 3: (PrimaryLogPG::start_recovery_ops(unsigned long, ThreadPool::TPHandle&, unsigned long*)+0x50e) [0x7f7cb9f3c4be]
 4: (OSD::do_recovery(PG*, unsigned int, unsigned long, ThreadPool::TPHandle&)+0x40c) [0x7f7cb9db58cc]
 5: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x771) [0x7f7cb9dbc5c1]
 6: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x947) [0x7f7cba4167a7]
 7: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x7f7cba418900]
 8: (()+0x7dc5) [0x7f7cb73a3dc5]
====

#ceph pg repair 1.5c4 

While we ran the pg repair the primary OSD associated with the above PG got crashed. Then it tries to boot again.

After pg repair excecution, as it went to  inconsistent+peering state.  

# ceph pg stat

v1139236: 8192 pgs: 1 inconsistent+peering, 5
remapped+peering, 2 stale+active+clean, 16 peering, 8168 active+clean;
31701 GB data, 43302 GB used, 1176 TB / 1218 TB avail; 44501 kB/s rd, 108 MB/s
wr, 1193 op/s”””

Initially deep-scrubbing was disabled in cluster and now we
have enabled it.

We also ran command 'ceph osd deep-scrub 258' but the osd
keeps on booting and we could not see any scrubbing initialized.

Removing OSD.258 from the cluster and reformat it again and re-inject to cluster is the only fix for this issue? Is there any way to get rid of this issue?

Awaiting for comments

Thanks
Jayaram

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com