trouble starting ceph @ boot

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



System: Ubuntu Trusty 14.04

Release : Kraken


Issue:

When starting ceph-osd daemon on boot via upstart. Error message in /var/log/upstart/ceph-osd-ceph_#.log reports 3 attempt to start the service with the errors message below



starting osd.12 at - osd_data /var/lib/ceph/osd/ceph-12 /var/lib/ceph/osd/ceph-12/journal

2017-05-09 13:38:34.507004 7f6d46a2e980 -1 journal FileJournal::_open: disabling aio for non-block journal. Use journal_force_aio to force use of aio anyway

2017-05-09 13:38:38.432333 7f6d46a2e980 -1 osd.12 2284024 PGs are upgrading

unable to look up group 'ceph': (34) Numerical result out of range

unable to look up group 'ceph': (34) Numerical result out of range

unable to look up group 'ceph': (34) Numerical result out of range



Workaround:



If I configure /etc/init/ceph-osd.conf like so



-respawn limit 3 1800

+respawn limit unlimited



I get roughly 20 attempts to start the each osd daemon and then it successfully starts.



Starting the daemons by hand works just fine after boot.



Possible reasons:



NSCD is being utilized and may not have started yet. However disabling this service doesn’t not improve starting the service without the workaround in place.





The message seems to be coming global/global_init.cc



./global/global_init.cc- struct passwd *p = 0;

./global/global_init.cc- getpwnam_r(g_conf->setuser.c_str(), &pa, buf, sizeof(buf), &p);

./global/global_init.cc- if (!p) {

./global/global_init.cc- cerr << "unable to look up user '" << g_conf->setuser << "'"

./global/global_init.cc- << std::endl;

./global/global_init.cc- exit(1);

./global/global_init.cc- }

./global/global_init.cc- uid = p->pw_uid;

./global/global_init.cc- gid = p->pw_gid;

./global/global_init.cc- uid_string = g_conf->setuser;

./global/global_init.cc- }

./global/global_init.cc- }

./global/global_init.cc- if (g_conf->setgroup.length() > 0) {

./global/global_init.cc- gid = atoi(g_conf->setgroup.c_str());

./global/global_init.cc- if (!gid) {

./global/global_init.cc- char buf[4096];

./global/global_init.cc- struct group gr;

./global/global_init.cc- struct group *g = 0;

./global/global_init.cc- getgrnam_r(g_conf->setgroup.c_str(), &gr, buf, sizeof(buf), &g);

./global/global_init.cc- if (!g) {

./global/global_init.cc: cerr << "unable to look up group '" << g_conf->setgroup << "'"

./global/global_init.cc- << ": " << cpp_strerror(errno) << std::endl;

./global/global_init.cc- exit(1);

./global/global_init.cc- }

./global/global_init.cc- gid = g->gr_gid;

./global/global_init.cc- gid_string = g_conf->setgroup;

./global/global_init.cc- }

./global/global_init.cc- }



34 as an error code seems to correspond to ERANGE Insufficient buffer space supplied. I assume this is because getgrnam_r() returns NULL if it can’t find the group.



But as to why the group isn’t retrievable I have no idea, As

getent group ceph

ceph:x:59623:ceph



GID changed for security reasons.



Additional Information:



I also see this in boot.log not sure if it is related

failed: 'ulimit -n 32768; /usr/bin/ceph-mds -i cephstorelx2 --pid-file /var/run/ceph/mds.cephstorelx2//mds.cephstorelx2.pid -c /etc/ceph/ceph.conf --cluster ceph --setuser ceph --setgroup ceph '


Any pointers would be helpful.


-Zach

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux