Thanks, Brad. That was the problem. Is there a reason why we don't log more descriptive info for this kind of failure? -- Tom > -----Original Message----- > From: Brad Hubbard [mailto:bhubbard@xxxxxxxxxx] > Sent: Sunday, December 13, 2015 4:19 PM > To: Deneau, Tom > Cc: ceph-devel@xxxxxxxxxxxxxxx > Subject: Re: ceph-mon terminated with status 28 > > ----- Original Message ----- > > From: "Tom Deneau" <tom.deneau@xxxxxxx> > > To: ceph-devel@xxxxxxxxxxxxxxx > > Sent: Sunday, 13 December, 2015 11:49:16 PM > > Subject: ceph-mon terminated with status 28 > > > > I am trying to understand the following failure: > > > > A small cluster was running fine, and then was left unused for a while. > > When I went to try to use it again, the mon socket wasn't there and I > > could see that ceph-mon was not running. I saw the lines below at the > > end of dmesg output. > > When I tried to restart ceph-mon using sudo start ceph-mon id=monhost, > > I got the same set of errors newly appended to dmesg output. > > > > I don't see anything more descriptive in /var/log/ceph/ceph-mon.log, > > just the recording of new mon processes starting. > > > > In this particular small cluster, the mon process was running on the > > same node with 7 osd processes. sudo initctl list shows that the osd > > procs are still up, although logging the fact that they can't > > communicate with the mon socket. > > > > Is there someplace else I should look for more details as to why mon > > is down and can't be restarted? > > > > -- Tom Deneau > > > > dmesg output: > > -------------- > > init: ceph-mon (ceph/monhost) main process (16538) terminated with > > status 28 > > init: ceph-mon (ceph/monhost) main process ended, respawning > > init: ceph-create-keys main process (16227) killed by TERM signal > > init: ceph-mon (ceph/monhost) main process (16546) terminated with > > status 28 > > init: ceph-mon (ceph/monhost) main process ended, respawning > > init: ceph-create-keys main process (16548) killed by TERM signal > > init: ceph-mon (ceph/monhost) main process (16556) terminated with > > status 28 > > init: ceph-mon (ceph/monhost) main process ended, respawning > > init: ceph-create-keys main process (16558) killed by TERM signal > > init: ceph-mon (ceph/monhost) main process (16566) terminated with > > status 28 > > init: ceph-mon (ceph/monhost) respawning too fast, stopped > > init: ceph-create-keys main process (16568) killed by TERM signal > > It looks like it's complaining about lack of space? > > src/ceph_mon.cc: > > 204 int main(int argc, const char **argv)· > 205 { > ----8<---- > 475 { > 476 // check fs stats. don't start if it's critically close to full. > 477 ceph_data_stats_t stats; > 478 int err = get_fs_stats(stats, g_conf->mon_data.c_str()); > 479 if (err < 0) { > 480 cerr << "error checking monitor data's fs stats: " << > cpp_strerror(err) > 481 << std::endl; > 482 exit(-err); > 483 } > 484 if (stats.avail_percent <= g_conf->mon_data_avail_crit) { > 485 cerr << "error: monitor data filesystem reached concerning > levels of" > 486 << " available storage space (available: " > 487 << stats.avail_percent << "% " << > prettybyte_t(stats.byte_avail) > 488 << ")\nyou may adjust 'mon data avail crit' to a lower > value" > 489 << " to make this go away (default: " << g_conf- > >mon_data_avail_crit > 490 << "%)\n" << std::endl; > 491 exit(ENOSPC); > 492 } > > #define ENOSPC 28 /* No space left on device */ > > Try starting ceph-mon from the command line and see if you get the above > message. > > > > > -- > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" > > in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo > > info at http://vger.kernel.org/majordomo-info.html > > ��.n��������+%������w��{.n����z��u���ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f