----- Original Message ----- > From: "Tom Deneau" <tom.deneau@xxxxxxx> > To: ceph-devel@xxxxxxxxxxxxxxx > Sent: Sunday, 13 December, 2015 11:49:16 PM > Subject: ceph-mon terminated with status 28 > > I am trying to understand the following failure: > > A small cluster was running fine, and then was left unused for a while. > When I went to try to use it again, the mon socket wasn't there and I could > see that > ceph-mon was not running. I saw the lines below at the end of dmesg output. > When I tried to restart ceph-mon using sudo start ceph-mon id=monhost, > I got the same set of errors newly appended to dmesg output. > > I don't see anything more descriptive in /var/log/ceph/ceph-mon.log, just > the recording of new mon processes starting. > > In this particular small cluster, the mon process was running on the same > node with 7 osd processes. sudo initctl list shows that the osd procs are > still > up, although logging the fact that they can't communicate with the mon > socket. > > Is there someplace else I should look for more details as to why mon is down > and can't be restarted? > > -- Tom Deneau > > dmesg output: > -------------- > init: ceph-mon (ceph/monhost) main process (16538) terminated with status 28 > init: ceph-mon (ceph/monhost) main process ended, respawning > init: ceph-create-keys main process (16227) killed by TERM signal > init: ceph-mon (ceph/monhost) main process (16546) terminated with status 28 > init: ceph-mon (ceph/monhost) main process ended, respawning > init: ceph-create-keys main process (16548) killed by TERM signal > init: ceph-mon (ceph/monhost) main process (16556) terminated with status 28 > init: ceph-mon (ceph/monhost) main process ended, respawning > init: ceph-create-keys main process (16558) killed by TERM signal > init: ceph-mon (ceph/monhost) main process (16566) terminated with status 28 > init: ceph-mon (ceph/monhost) respawning too fast, stopped > init: ceph-create-keys main process (16568) killed by TERM signal It looks like it's complaining about lack of space? src/ceph_mon.cc: 204 int main(int argc, const char **argv)· 205 { ----8<---- 475 { 476 // check fs stats. don't start if it's critically close to full. 477 ceph_data_stats_t stats; 478 int err = get_fs_stats(stats, g_conf->mon_data.c_str()); 479 if (err < 0) { 480 cerr << "error checking monitor data's fs stats: " << cpp_strerror(err) 481 << std::endl; 482 exit(-err); 483 } 484 if (stats.avail_percent <= g_conf->mon_data_avail_crit) { 485 cerr << "error: monitor data filesystem reached concerning levels of" 486 << " available storage space (available: " 487 << stats.avail_percent << "% " << prettybyte_t(stats.byte_avail) 488 << ")\nyou may adjust 'mon data avail crit' to a lower value" 489 << " to make this go away (default: " << g_conf->mon_data_avail_crit 490 << "%)\n" << std::endl; 491 exit(ENOSPC); 492 } #define ENOSPC 28 /* No space left on device */ Try starting ceph-mon from the command line and see if you get the above message. > > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html