Re: High memory usage kills OSD while peering

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



hi, this would not be normally an issue.
but i think the whole thing of oom killing them and nodes dying, made the osds do alot of errors when writing files to disk. so we are seeing 100s of those files till now. and not sure how much still there is to fix. we had to do "ceph osd set pause" to keep recovery moving, other wise it is a mess. i am willing to patch this if any one have a nice idea on how to deal with it as i am not sure what is the best to do.

my idea was (not sure how easy to implement) is to check if we have a size mismatch, we then grab all the chunks, and take enough shards with matching sizes, and decode them. and probably mark the pg inconsistent, and let the repair deal with it when the pg finish recovering.

On 08/29/2017 10:34 PM, Mustafa Muhammad wrote:
I reported this issue, if you can take a look:

http://tracker.ceph.com/issues/21173

Regards
Mustafa

On Tue, Aug 29, 2017 at 10:44 AM, Mustafa Muhammad
<mustafa1024m@xxxxxxxxx> wrote:
Hi all,
Not sure if I should open a new thread, but this is the same cluster,
so this should provide a little background.
Now the cluster is up and recovering, but we are hitting a bug that is
crashing the OSD

      0> 2017-08-29 10:00:51.699557 7fae66139700 -1
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.1.4/rpm/el7/BUILD/ceph-12.1.4/src/osd/ECUtil.cc:
In function 'int ECUtil::decode(const ECUtil::stripe_info_t&,
ceph::ErasureCodeInterfaceRef&, std::map<int, ceph::buffer::list>&,
std::map<int, ceph::buffer::list*>&)' thread 7fae66139700 time
2017-08-29 10:00:51.688625
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.1.4/rpm/el7/BUILD/ceph-12.1.4/src/osd/ECUtil.cc:
59: FAILED assert(i->second.length() == total_data_size)

Probably http://tracker.ceph.com/issues/14009

Some shards are problematic, smaller sizes (definitely a problem) or
last part of them is all zeros (not sure if this is padding or
problem).

Now we have set noup, marked OSDs with corrupt chunks down, and let
the recovery proceed, but this is happening in lots of PGs and is very
slow.
Is there anything we can do to fix this faster, we tried removing the
corrupted chunk? and got this crash (I grep the thread in which Abort
happened):

    -77> 2017-08-28 15:11:40.030178 7f90cd519700  0 osd.377 pg_epoch:
1102631 pg[143.1b0s0( v 1098703'309813 (960110'306653,1098703'309813]
local-lis/les=1102586/1102609 n=63499 ec=470378/470378 lis/c
1102586/960364 les/c/f 1102609/960364/1061015 1102545/1102586/1102586)
[377,77,248,635,642,111,182,234,531,307,29,648]/[377,77,248,198,529,111,182,234,548,307,29,174]
r=0 lpr=1102586 pi=[960339,1102586)/44 rops=1
bft=531(8),635(3),642(4),648(11) crt=1098703'309813 lcod 0'0 mlcod 0'0
active+remapped+backfilling] failed_push
143:0d9ce204:::default.63296332.1__shadow_2033460653.2~dpBlpEu3nMuFDe6ikBFMso5ivuBb7oj.1_93:head
from shard 548(8), reps on  unfound? 0
     -2> 2017-08-28 15:11:40.130722 7f90cd519700 -1 osd.377 pg_epoch:
1102631 pg[143.1b0s0( v 1098703'309813 (960110'306653,1098703'309813]
local-lis/les=1102586/1102609 n=63499 ec=470378/470378 lis/c
1102586/960364 les/c/f 1102609/960364/1061015 1102545/1102586/1102586)
[377,77,248,635,642,111,182,234,531,307,29,648]/[377,77,248,198,529,111,182,234,548,307,29,174]
r=0 lpr=1102586 pi=[960339,1102586)/44
bft=531(8),635(3),642(4),648(11) crt=1098703'309813 lcod 0'0 mlcod 0'0
active+remapped+backfilling] recover_replicas: object
143:0d9ce204:::default.63296332.1__shadow_2033460653.2~dpBlpEu3nMuFDe6ikBFMso5ivuBb7oj.1_93:head
last_backfill 143:0d9ce1c5:::default.63296332.1__shadow_26882237.2~mGGm_A45xKldAdADFC13qizbUiC0Yrw.1_158:head
     -1> 2017-08-28 15:11:40.130802 7f90cd519700 -1 osd.377 pg_epoch:
1102631 pg[143.1b0s0( v 1098703'309813 (960110'306653,1098703'309813]
local-lis/les=1102586/1102609 n=63499 ec=470378/470378 lis/c
1102586/960364 les/c/f 1102609/960364/1061015 1102545/1102586/1102586)
[377,77,248,635,642,111,182,234,531,307,29,648]/[377,77,248,198,529,111,182,234,548,307,29,174]
r=0 lpr=1102586 pi=[960339,1102586)/44
bft=531(8),635(3),642(4),648(11) crt=1098703'309813 lcod 0'0 mlcod 0'0
active+remapped+backfilling] recover_replicas: object added to missing
set for backfill, but is not in recovering, error!
      0> 2017-08-28 15:11:40.134768 7f90cd519700 -1 *** Caught signal
(Aborted) **
in thread 7f90cd519700 thread_name:tp_osd_tp

What we can do to fix this?
Will enabling fast_read on the pool benefit us or it is client only?
Any ideas?

Regards
Mustafa

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux