Re: Ceph BIG outage : 200+ OSD are down , OSD cannot create thread

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thanks Sage

I will create a “new feature” request on tracker.ceph.com so that this discussion should not get buried under mailing list. 

Developers can implement this as per their convenience.


****************************************************************
Karan Singh 
Systems Specialist , Storage Platforms
CSC - IT Center for Science,
Keilaranta 14, P. O. Box 405, FIN-02101 Espoo, Finland
mobile: +358 503 812758
tel. +358 9 4572001
fax +358 9 4572302
http://www.csc.fi/
****************************************************************

On 10 Mar 2015, at 14:26, Sage Weil <sage@xxxxxxxxxxxx> wrote:

On Tue, 10 Mar 2015, Christian Eichelmann wrote:
Hi Sage,

we hit this problem a few monthes ago as well and it took us quite a while to
figure out what's wrong.

As a Systemadministrator I don't like the idea that daemons or even init
scripts are changing system wide configuration parameters, so I wouldn't like
to see the OSDs do it themself.

This is my general feeling as well.  As we move to systemd, I'd like to
have the ceph unit file get away from this entirely and have the admin set
these values in /etc/security/limits.conf or /etc/sysctl.d.  The main
thing making this problematic right now is that the daemons run as root
instead of a 'ceph' user.

The idea with the warning is on one hand a good hint, on the other hand it
also may confuse people, since changing this setting is not required for
common hardware.

If we make it warn only if it reaches > 50% of the threshold that is
probably safe...

sage



Regards,
Christian

On 03/09/2015 08:01 PM, Sage Weil wrote:
On Mon, 9 Mar 2015, Karan Singh wrote:
Thanks Guys kernel.pid_max=4194303 did the trick.
Great to hear!  Sorry we missed that you only had it at 65536.

This is a really common problem that people hit when their clusters start
to grow.  Is there somewhere in the docs we can put this to catch more
users?  Or maybe a warning issued by the osds themselves or something if
they see limits that are low?

sage

- Karan -

      On 09 Mar 2015, at 14:48, Christian Eichelmann
      <christian.eichelmann@xxxxxxxx> wrote:

Hi Karan,

as you are actually writing in your own book, the problem is the
sysctl
setting "kernel.pid_max". I've seen in your bug report that you were
setting it to 65536, which is still to low for high density hardware.

In our cluster, one OSD server has in an idle situation about 66.000
Threads (60 OSDs per Server). The number of threads increases when you
increase the number of placement groups in the cluster, which I think
has triggered your problem.

Set the "kernel.pid_max" setting to 4194303 (the maximum) like Azad
Aliyar suggested, and the problem should be gone.

Regards,
Christian

Am 09.03.2015 11:41, schrieb Karan Singh:
      Hello Community need help to fix a long going Ceph
      problem.

      Cluster is unhealthy , Multiple OSDs are DOWN. When i am
      trying to
      restart OSD?s i am getting this error


      /2015-03-09 12:22:16.312774 7f760dac9700 -1
      common/Thread.cc
      <http://Thread.cc>: In function 'void
      Thread::create(size_t)' thread
      7f760dac9700 time 2015-03-09 12:22:16.311970/
      /common/Thread.cc <http://Thread.cc>: 129: FAILED
      assert(ret == 0)/


      *Environment *:  4 Nodes , OSD+Monitor , Firefly latest ,
      CentOS6.5
      , 3.17.2-1.el6.elrepo.x86_64

      Tried upgrading from 0.80.7 to 0.80.8  but no Luck

      Tried centOS stock kernel 2.6.32  but no Luck

      Memory is not a problem more then 150+GB is free


      Did any one every faced this problem ??

      *Cluster status *
      *
      *
      / cluster 2bd3283d-67ef-4316-8b7e-d8f4747eae33/
      /     health HEALTH_WARN 7334 pgs degraded; 1185 pgs down;
      1 pgs
      incomplete; 1735 pgs peering; 8938 pgs stale; 1/
      /736 pgs stuck inactive; 8938 pgs stuck stale; 10320 pgs
      stuck unclean;
      recovery 6061/31080 objects degraded (19/
      /.501%); 111/196 in osds are down; clock skew detected on
      mon.pouta-s02,
      mon.pouta-s03/
      /     monmap e3: 3 mons at
{pouta-s01=10.XXX.50.1:6789/0,pouta-s02=10.XXX.50.2:6789/0,pouta-s03=10.XXX
      .50.3:6789/
      //0}, election epoch 1312, quorum 0,1,2
      pouta-s01,pouta-s02,pouta-s03/
      /   * osdmap e26633: 239 osds: 85 up, 196 in*/
      /      pgmap v60389: 17408 pgs, 13 pools, 42345 MB data,
      10360 objects/
      /            4699 GB used, 707 TB / 711 TB avail/
      /            6061/31080 objects degraded (19.501%)/
      /                  14 down+remapped+peering/
      /                  39 active/
      /                3289 active+clean/
      /                 547 peering/
      /                 663 stale+down+peering/
      /                 705 stale+active+remapped/
      /                   1 active+degraded+remapped/
      /                   1 stale+down+incomplete/
      /                 484 down+peering/
      /                 455 active+remapped/
      /                3696 stale+active+degraded/
      /                   4 remapped+peering/
      /                  23 stale+down+remapped+peering/
      /                  51 stale+active/
      /                3637 active+degraded/
      /                3799 stale+active+clean/

      *OSD :  Logs *

      /2015-03-09 12:22:16.312774 7f760dac9700 -1
      common/Thread.cc
      <http://Thread.cc>: In function 'void
      Thread::create(size_t)' thread
      7f760dac9700 time 2015-03-09 12:22:16.311970/
      /common/Thread.cc <http://Thread.cc>: 129: FAILED
      assert(ret == 0)/
      /
      /
      / ceph version 0.80.8
      (69eaad7f8308f21573c604f121956e64679a52a7)/
      / 1: (Thread::create(unsigned long)+0x8a) [0xaf41da]/
      / 2: (SimpleMessenger::add_accept_pipe(int)+0x6a)
      [0xae84fa]/
      / 3: (Accepter::entry()+0x265) [0xb5c635]/
      / 4: /lib64/libpthread.so.0() [0x3c8a6079d1]/
      / 5: (clone()+0x6d) [0x3c8a2e89dd]/
      / NOTE: a copy of the executable, or `objdump -rdS
      <executable>` is
      needed to interpret this./


      *More information at Ceph Tracker Issue :
      *http://tracker.ceph.com/issues/10988#change-49018


      ****************************************************************
      Karan Singh
      Systems Specialist , Storage Platforms
      CSC - IT Center for Science,
      Keilaranta 14, P. O. Box 405, FIN-02101 Espoo, Finland
      mobile: +358 503 812758
      tel. +358 9 4572001
      fax +358 9 4572302
      http://www.csc.fi/
      ****************************************************************



      _______________________________________________
      ceph-users mailing list
      ceph-users@xxxxxxxxxxxxxx
      http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



--
Christian Eichelmann
Systemadministrator

1&1 Internet AG - IT Operations Mail & Media Advertising & Targeting
Brauerstraße 48 · DE-76135 Karlsruhe
Telefon: +49 721 91374-8026
christian.eichelmann@xxxxxxxx

Amtsgericht Montabaur / HRB 6484
Vorstände: Henning Ahlert, Ralph Dommermuth, Matthias Ehrlich, Robert
Hoffmann, Markus Huhn, Hans-Henning Kettler, Dr. Oliver Mauss, Jan
Oetjen
Aufsichtsratsvorsitzender: Michael Scheeren





Attachment: smime.p7s
Description: S/MIME cryptographic signature

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux