Re: High memory usage kills OSD while peering

Mustafa Muhammad <mustafa1024m@xxxxxxxxx> · Tue, 29 Aug 2017 10:44:05 +0300

Hi all,
Not sure if I should open a new thread, but this is the same cluster,
so this should provide a little background.
Now the cluster is up and recovering, but we are hitting a bug that is
crashing the OSD

     0> 2017-08-29 10:00:51.699557 7fae66139700 -1
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.1.4/rpm/el7/BUILD/ceph-12.1.4/src/osd/ECUtil.cc:
In function 'int ECUtil::decode(const ECUtil::stripe_info_t&,
ceph::ErasureCodeInterfaceRef&, std::map<int, ceph::buffer::list>&,
std::map<int, ceph::buffer::list*>&)' thread 7fae66139700 time
2017-08-29 10:00:51.688625
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.1.4/rpm/el7/BUILD/ceph-12.1.4/src/osd/ECUtil.cc:
59: FAILED assert(i->second.length() == total_data_size)

Probably http://tracker.ceph.com/issues/14009

Some shards are problematic, smaller sizes (definitely a problem) or
last part of them is all zeros (not sure if this is padding or
problem).

Now we have set noup, marked OSDs with corrupt chunks down, and let
the recovery proceed, but this is happening in lots of PGs and is very
slow.
Is there anything we can do to fix this faster, we tried removing the
corrupted chunk? and got this crash (I grep the thread in which Abort
happened):

   -77> 2017-08-28 15:11:40.030178 7f90cd519700  0 osd.377 pg_epoch:
1102631 pg[143.1b0s0( v 1098703'309813 (960110'306653,1098703'309813]
local-lis/les=1102586/1102609 n=63499 ec=470378/470378 lis/c
1102586/960364 les/c/f 1102609/960364/1061015 1102545/1102586/1102586)
[377,77,248,635,642,111,182,234,531,307,29,648]/[377,77,248,198,529,111,182,234,548,307,29,174]
r=0 lpr=1102586 pi=[960339,1102586)/44 rops=1
bft=531(8),635(3),642(4),648(11) crt=1098703'309813 lcod 0'0 mlcod 0'0
active+remapped+backfilling] failed_push
143:0d9ce204:::default.63296332.1__shadow_2033460653.2~dpBlpEu3nMuFDe6ikBFMso5ivuBb7oj.1_93:head
from shard 548(8), reps on  unfound? 0
    -2> 2017-08-28 15:11:40.130722 7f90cd519700 -1 osd.377 pg_epoch:
1102631 pg[143.1b0s0( v 1098703'309813 (960110'306653,1098703'309813]
local-lis/les=1102586/1102609 n=63499 ec=470378/470378 lis/c
1102586/960364 les/c/f 1102609/960364/1061015 1102545/1102586/1102586)
[377,77,248,635,642,111,182,234,531,307,29,648]/[377,77,248,198,529,111,182,234,548,307,29,174]
r=0 lpr=1102586 pi=[960339,1102586)/44
bft=531(8),635(3),642(4),648(11) crt=1098703'309813 lcod 0'0 mlcod 0'0
active+remapped+backfilling] recover_replicas: object
143:0d9ce204:::default.63296332.1__shadow_2033460653.2~dpBlpEu3nMuFDe6ikBFMso5ivuBb7oj.1_93:head
last_backfill 143:0d9ce1c5:::default.63296332.1__shadow_26882237.2~mGGm_A45xKldAdADFC13qizbUiC0Yrw.1_158:head
    -1> 2017-08-28 15:11:40.130802 7f90cd519700 -1 osd.377 pg_epoch:
1102631 pg[143.1b0s0( v 1098703'309813 (960110'306653,1098703'309813]
local-lis/les=1102586/1102609 n=63499 ec=470378/470378 lis/c
1102586/960364 les/c/f 1102609/960364/1061015 1102545/1102586/1102586)
[377,77,248,635,642,111,182,234,531,307,29,648]/[377,77,248,198,529,111,182,234,548,307,29,174]
r=0 lpr=1102586 pi=[960339,1102586)/44
bft=531(8),635(3),642(4),648(11) crt=1098703'309813 lcod 0'0 mlcod 0'0
active+remapped+backfilling] recover_replicas: object added to missing
set for backfill, but is not in recovering, error!
     0> 2017-08-28 15:11:40.134768 7f90cd519700 -1 *** Caught signal
(Aborted) **
in thread 7f90cd519700 thread_name:tp_osd_tp

What we can do to fix this?
Will enabling fast_read on the pool benefit us or it is client only?
Any ideas?

Regards
Mustafa
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html