Hi Karan,
We faced same issue and resolved after increasing the open file limit and maximum no of threads
Config reference
/etc/security/limit.conf
root hard nofile 65535
sysctl -w kernel.pid_max=4194303
Cheers
Mohamed Pakkeer
On Mon, Mar 9, 2015 at 4:20 PM, Azad Aliyar <azad.aliyar@xxxxxxxxxxxxxxxx> wrote:
Check Max Threadcount: If you have a node with a lot of OSDs, you may be hitting the default maximum number of threads (e.g., usually 32k), especially during recovery. You can increase the number of threads using sysctl to see if increasing the maximum number of threads to the maximum possible number of threads allowed (i.e., 4194303) will help. For example:
sysctl -w kernel.pid_max=4194303If increasing the maximum thread count resolves the issue, you can make it permanent by including a kernel.pid_max setting in the /etc/sysctl.conf file. For example:
kernel.pid_max = 4194303On Mon, Mar 9, 2015 at 4:11 PM, Karan Singh <karan.singh@xxxxxx> wrote:_______________________________________________Hello Community need help to fix a long going Ceph problem.Cluster is unhealthy , Multiple OSDs are DOWN. When i am trying to restart OSD’s i am getting this error2015-03-09 12:22:16.312774 7f760dac9700 -1 common/Thread.cc: In function 'void Thread::create(size_t)' thread 7f760dac9700 time 2015-03-09 12:22:16.311970common/Thread.cc: 129: FAILED assert(ret == 0)Environment : 4 Nodes , OSD+Monitor , Firefly latest , CentOS6.5 , 3.17.2-1.el6.elrepo.x86_64Tried upgrading from 0.80.7 to 0.80.8 but no LuckTried centOS stock kernel 2.6.32 but no LuckMemory is not a problem more then 150+GB is freeDid any one every faced this problem ??Cluster statuscluster 2bd3283d-67ef-4316-8b7e-d8f4747eae33health HEALTH_WARN 7334 pgs degraded; 1185 pgs down; 1 pgs incomplete; 1735 pgs peering; 8938 pgs stale; 1736 pgs stuck inactive; 8938 pgs stuck stale; 10320 pgs stuck unclean; recovery 6061/31080 objects degraded (19.501%); 111/196 in osds are down; clock skew detected on mon.pouta-s02, mon.pouta-s03monmap e3: 3 mons at {pouta-s01=10.XXX.50.1:6789/0,pouta-s02=10.XXX.50.2:6789/0,pouta-s03=10.XXX.50.3:6789/0}, election epoch 1312, quorum 0,1,2 pouta-s01,pouta-s02,pouta-s03osdmap e26633: 239 osds: 85 up, 196 inpgmap v60389: 17408 pgs, 13 pools, 42345 MB data, 10360 objects4699 GB used, 707 TB / 711 TB avail6061/31080 objects degraded (19.501%)14 down+remapped+peering39 active3289 active+clean547 peering663 stale+down+peering705 stale+active+remapped1 active+degraded+remapped1 stale+down+incomplete484 down+peering455 active+remapped3696 stale+active+degraded4 remapped+peering23 stale+down+remapped+peering51 stale+active3637 active+degraded3799 stale+active+cleanOSD : Logs2015-03-09 12:22:16.312774 7f760dac9700 -1 common/Thread.cc: In function 'void Thread::create(size_t)' thread 7f760dac9700 time 2015-03-09 12:22:16.311970common/Thread.cc: 129: FAILED assert(ret == 0)ceph version 0.80.8 (69eaad7f8308f21573c604f121956e64679a52a7)1: (Thread::create(unsigned long)+0x8a) [0xaf41da]2: (SimpleMessenger::add_accept_pipe(int)+0x6a) [0xae84fa]3: (Accepter::entry()+0x265) [0xb5c635]4: /lib64/libpthread.so.0() [0x3c8a6079d1]5: (clone()+0x6d) [0x3c8a2e89dd]NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.More information at Ceph Tracker Issue : http://tracker.ceph.com/issues/10988#change-49018
****************************************************************Karan SinghSystems Specialist , Storage PlatformsCSC - IT Center for Science,Keilaranta 14, P. O. Box 405, FIN-02101 Espoo, Finland
mobile: +358 503 812758
tel. +358 9 4572001
fax +358 9 4572302
http://www.csc.fi/****************************************************************
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
--
Warm Regards, Azad Aliyar Linux Server Engineer Email : azad.aliyar@xxxxxxxxxxxxxxxx | Skype : spark.azad
3rd Floor, Leela Infopark, Phase -2,Kakanad, Kochi-30, Kerala, India Phone:+91 484 6561696 , Mobile:91-8129270421. Confidentiality Notice: Information in this e-mail is proprietary to SparkSupport. and is intended for use only by the addressed, and may contain information that is privileged, confidential or exempt from disclosure. If you are not the intended recipient, you are notified that any use of this information in any manner is strictly prohibited. Please delete this mail & notify us immediately at info@xxxxxxxxxxxxxxxx
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Thanks & Regards
K.Mohamed Pakkeer
Mobile- 0091-8754410114
K.Mohamed Pakkeer
Mobile- 0091-8754410114
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com