Brad -- The issue is in tracker now.. http://tracker.ceph.com/issues/14088 -- Tom > -----Original Message----- > From: Brad Hubbard [mailto:bhubbard@xxxxxxxxxx] > Sent: Monday, December 14, 2015 3:47 PM > To: Deneau, Tom > Cc: ceph-devel@xxxxxxxxxxxxxxx > Subject: Re: ceph-mon terminated with status 28 > > ----- Original Message ----- > > From: "Tom Deneau" <tom.deneau@xxxxxxx> > > To: "Brad Hubbard" <bhubbard@xxxxxxxxxx> > > Cc: ceph-devel@xxxxxxxxxxxxxxx > > Sent: Tuesday, 15 December, 2015 3:21:27 AM > > Subject: RE: ceph-mon terminated with status 28 > > > > Thanks, Brad. That was the problem. > > Np. > > > > > Is there a reason why we don't log more descriptive info for this kind > of > > failure? > > I guess it may not have been anticipated that init would swallow these > types of > errors early in the process and just report the return code. > > If you wouldn't mind opening a tracker for "Fatal errors at start-up are > not > logged", or something similar, I can take a look at getting some > meaningful log > entries reported during these early failures. > > Let me know the tracker number. > > Cheers, > Brad > > > > > -- Tom > > > > > -----Original Message----- > > > From: Brad Hubbard [mailto:bhubbard@xxxxxxxxxx] > > > Sent: Sunday, December 13, 2015 4:19 PM > > > To: Deneau, Tom > > > Cc: ceph-devel@xxxxxxxxxxxxxxx > > > Subject: Re: ceph-mon terminated with status 28 > > > > > > ----- Original Message ----- > > > > From: "Tom Deneau" <tom.deneau@xxxxxxx> > > > > To: ceph-devel@xxxxxxxxxxxxxxx > > > > Sent: Sunday, 13 December, 2015 11:49:16 PM > > > > Subject: ceph-mon terminated with status 28 > > > > > > > > I am trying to understand the following failure: > > > > > > > > A small cluster was running fine, and then was left unused for a > while. > > > > When I went to try to use it again, the mon socket wasn't there and > I > > > > could see that ceph-mon was not running. I saw the lines below at > the > > > > end of dmesg output. > > > > When I tried to restart ceph-mon using sudo start ceph-mon > id=monhost, > > > > I got the same set of errors newly appended to dmesg output. > > > > > > > > I don't see anything more descriptive in /var/log/ceph/ceph-mon.log, > > > > just the recording of new mon processes starting. > > > > > > > > In this particular small cluster, the mon process was running on the > > > > same node with 7 osd processes. sudo initctl list shows that the > osd > > > > procs are still up, although logging the fact that they can't > > > > communicate with the mon socket. > > > > > > > > Is there someplace else I should look for more details as to why mon > > > > is down and can't be restarted? > > > > > > > > -- Tom Deneau > > > > > > > > dmesg output: > > > > -------------- > > > > init: ceph-mon (ceph/monhost) main process (16538) terminated with > > > > status 28 > > > > init: ceph-mon (ceph/monhost) main process ended, respawning > > > > init: ceph-create-keys main process (16227) killed by TERM signal > > > > init: ceph-mon (ceph/monhost) main process (16546) terminated with > > > > status 28 > > > > init: ceph-mon (ceph/monhost) main process ended, respawning > > > > init: ceph-create-keys main process (16548) killed by TERM signal > > > > init: ceph-mon (ceph/monhost) main process (16556) terminated with > > > > status 28 > > > > init: ceph-mon (ceph/monhost) main process ended, respawning > > > > init: ceph-create-keys main process (16558) killed by TERM signal > > > > init: ceph-mon (ceph/monhost) main process (16566) terminated with > > > > status 28 > > > > init: ceph-mon (ceph/monhost) respawning too fast, stopped > > > > init: ceph-create-keys main process (16568) killed by TERM signal > > > > > > It looks like it's complaining about lack of space? > > > > > > src/ceph_mon.cc: > > > > > > 204 int main(int argc, const char **argv)· > > > 205 { > > > ----8<---- > > > 475 { > > > 476 // check fs stats. don't start if it's critically close to > full. > > > 477 ceph_data_stats_t stats; > > > 478 int err = get_fs_stats(stats, g_conf->mon_data.c_str()); > > > 479 if (err < 0) { > > > 480 cerr << "error checking monitor data's fs stats: " << > > > cpp_strerror(err) > > > 481 << std::endl; > > > 482 exit(-err); > > > 483 } > > > 484 if (stats.avail_percent <= g_conf->mon_data_avail_crit) { > > > 485 cerr << "error: monitor data filesystem reached concerning > > > levels of" > > > 486 << " available storage space (available: " > > > 487 << stats.avail_percent << "% " << > > > prettybyte_t(stats.byte_avail) > > > 488 << ")\nyou may adjust 'mon data avail crit' to a lower > > > value" > > > 489 << " to make this go away (default: " << g_conf- > > > >mon_data_avail_crit > > > 490 << "%)\n" << std::endl; > > > 491 exit(ENOSPC); > > > 492 } > > > > > > #define ENOSPC 28 /* No space left on device */ > > > > > > Try starting ceph-mon from the command line and see if you get the > above > > > message. > > > > > > > > > > > -- > > > > To unsubscribe from this list: send the line "unsubscribe ceph- > devel" > > > > in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo > > > > info at http://vger.kernel.org/majordomo-info.html > > > > > > > N�����r��y���b�X��ǧv�^�){.n�+���z�]z���{ay�ʇڙ�,j��f���h���z��w������j:+v� > ��w�j�m��������zZ+��ݢj"�� ��.n��������+%������w��{.n����z��u���ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f