----- Original Message ----- > From: "Tom Deneau" <tom.deneau@xxxxxxx> > To: "Brad Hubbard" <bhubbard@xxxxxxxxxx> > Cc: ceph-devel@xxxxxxxxxxxxxxx > Sent: Tuesday, 15 December, 2015 3:21:27 AM > Subject: RE: ceph-mon terminated with status 28 > > Thanks, Brad. That was the problem. Np. > > Is there a reason why we don't log more descriptive info for this kind of > failure? I guess it may not have been anticipated that init would swallow these types of errors early in the process and just report the return code. If you wouldn't mind opening a tracker for "Fatal errors at start-up are not logged", or something similar, I can take a look at getting some meaningful log entries reported during these early failures. Let me know the tracker number. Cheers, Brad > > -- Tom > > > -----Original Message----- > > From: Brad Hubbard [mailto:bhubbard@xxxxxxxxxx] > > Sent: Sunday, December 13, 2015 4:19 PM > > To: Deneau, Tom > > Cc: ceph-devel@xxxxxxxxxxxxxxx > > Subject: Re: ceph-mon terminated with status 28 > > > > ----- Original Message ----- > > > From: "Tom Deneau" <tom.deneau@xxxxxxx> > > > To: ceph-devel@xxxxxxxxxxxxxxx > > > Sent: Sunday, 13 December, 2015 11:49:16 PM > > > Subject: ceph-mon terminated with status 28 > > > > > > I am trying to understand the following failure: > > > > > > A small cluster was running fine, and then was left unused for a while. > > > When I went to try to use it again, the mon socket wasn't there and I > > > could see that ceph-mon was not running. I saw the lines below at the > > > end of dmesg output. > > > When I tried to restart ceph-mon using sudo start ceph-mon id=monhost, > > > I got the same set of errors newly appended to dmesg output. > > > > > > I don't see anything more descriptive in /var/log/ceph/ceph-mon.log, > > > just the recording of new mon processes starting. > > > > > > In this particular small cluster, the mon process was running on the > > > same node with 7 osd processes. sudo initctl list shows that the osd > > > procs are still up, although logging the fact that they can't > > > communicate with the mon socket. > > > > > > Is there someplace else I should look for more details as to why mon > > > is down and can't be restarted? > > > > > > -- Tom Deneau > > > > > > dmesg output: > > > -------------- > > > init: ceph-mon (ceph/monhost) main process (16538) terminated with > > > status 28 > > > init: ceph-mon (ceph/monhost) main process ended, respawning > > > init: ceph-create-keys main process (16227) killed by TERM signal > > > init: ceph-mon (ceph/monhost) main process (16546) terminated with > > > status 28 > > > init: ceph-mon (ceph/monhost) main process ended, respawning > > > init: ceph-create-keys main process (16548) killed by TERM signal > > > init: ceph-mon (ceph/monhost) main process (16556) terminated with > > > status 28 > > > init: ceph-mon (ceph/monhost) main process ended, respawning > > > init: ceph-create-keys main process (16558) killed by TERM signal > > > init: ceph-mon (ceph/monhost) main process (16566) terminated with > > > status 28 > > > init: ceph-mon (ceph/monhost) respawning too fast, stopped > > > init: ceph-create-keys main process (16568) killed by TERM signal > > > > It looks like it's complaining about lack of space? > > > > src/ceph_mon.cc: > > > > 204 int main(int argc, const char **argv)· > > 205 { > > ----8<---- > > 475 { > > 476 // check fs stats. don't start if it's critically close to full. > > 477 ceph_data_stats_t stats; > > 478 int err = get_fs_stats(stats, g_conf->mon_data.c_str()); > > 479 if (err < 0) { > > 480 cerr << "error checking monitor data's fs stats: " << > > cpp_strerror(err) > > 481 << std::endl; > > 482 exit(-err); > > 483 } > > 484 if (stats.avail_percent <= g_conf->mon_data_avail_crit) { > > 485 cerr << "error: monitor data filesystem reached concerning > > levels of" > > 486 << " available storage space (available: " > > 487 << stats.avail_percent << "% " << > > prettybyte_t(stats.byte_avail) > > 488 << ")\nyou may adjust 'mon data avail crit' to a lower > > value" > > 489 << " to make this go away (default: " << g_conf- > > >mon_data_avail_crit > > 490 << "%)\n" << std::endl; > > 491 exit(ENOSPC); > > 492 } > > > > #define ENOSPC 28 /* No space left on device */ > > > > Try starting ceph-mon from the command line and see if you get the above > > message. > > > > > > > > -- > > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" > > > in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo > > > info at http://vger.kernel.org/majordomo-info.html > > > > N�����r��y���b�X��ǧv�^�){.n�+���z�]z���{ay�ʇڙ�,j��f���h���z��w������j:+v���w�j�m��������zZ+��ݢj"�� -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html