Ideally I would like everything in /var/log/calmari be sure to set calamari.conf like so: [shadow_man@vpm107 ~]$ grep DEBUG /etc/calamari/calamari.conf log_level = DEBUG db_log_level = DEBUG log_level = DEBUG then restart cthulhu and apache visit http://essperf3/api/v2/cluster and http://essperf3 and then share the logs here. Hopefully something obvious will be off in either calamari or cthulhu log regards, Gregory > On May 12, 2015, at 6:11 PM, Bruce McFarland <Bruce.McFarland@xxxxxxxxxxxxxxxx> wrote: > > Which logs? I'm assuming /var/log/salt/minon since the rest on the minions are relatively empty. Possibly Cthulhu from the master? > > I'm running on Ubuntu 14.04 and don't have an httpd service. I had been start/stopping apache2. Likewise there is no supervisord service and I've been using supervisorctl to start/stop Cthulhu. > > I've performed the calamari-ctl clear/init sequence more than twice with also stopping/starting apache2 and Cthulhu. > >> -----Original Message----- >> From: Gregory Meno [mailto:gmeno@xxxxxxxxxx] >> Sent: Tuesday, May 12, 2015 5:58 PM >> To: Bruce McFarland >> Cc: ceph-calamari@xxxxxxxxxxxxxx; ceph-users@xxxxxxxx; ceph-devel >> (ceph-devel@xxxxxxxxxxxxxxx) >> Subject: Re: [ceph-calamari] Does anyone understand Calamari?? >> >> All that looks fine. >> >> There must be some state where the cluster is known to calamari and it is >> failing to actually show it. >> >> If you have time to debug I would love to see the logs at debug level. >> >> If you don’t we could try cleaning out calamari’s state. >> sudo supervisorctl shutdown >> sudo service httpd stop >> sudo calamari-ctl cl—yes-i-am-sure >> sudo calamari-ctl initialize >> ca >> then >> sudo service supervisord start >> sudo service httpd start >> >> see what the API and UI says then. >> >> regards, >> Gregory >>> On May 12, 2015, at 5:18 PM, Bruce McFarland >> <Bruce.McFarland@xxxxxxxxxxxxxxxx> wrote: >>> >>> Master was ess68 and now it's essperf3. >>> >>> On all cluster nodes the following files now have 'master: essperf3' >>> /etc/salt/minion >>> /etc/salt/minion/calamari.conf >>> /etc/diamond/diamond.conf >>> >>> The 'salt \* ceph.get_heartbeats' is being run on essperf3 - heres a 'salt \* >> test.ping' from essperf3 Calamari Master to the cluster. I've also included a >> quick cluster sanity test with the output of ceph -s and ceph osd tree. And for >> your reading pleasure the output of 'salt octeon109 ceph.get_heartbeats' >> since I suspect there might be a missing field in the monitor response. >>> >>> oot@essperf3:/etc/ceph# salt \* test.ping >>> octeon108: >>> True >>> octeon114: >>> True >>> octeon111: >>> True >>> octeon101: >>> True >>> octeon106: >>> True >>> octeon109: >>> True >>> octeon118: >>> True >>> root@essperf3:/etc/ceph# ceph osd tree >>> # id weight type name up/down reweight >>> -1 7 root default >>> -4 1 host octeon108 >>> 0 1 osd.0 up 1 >>> -2 1 host octeon111 >>> 1 1 osd.1 up 1 >>> -5 1 host octeon115 >>> 2 1 osd.2 DNE >>> -6 1 host octeon118 >>> 3 1 osd.3 up 1 >>> -7 1 host octeon114 >>> 4 1 osd.4 up 1 >>> -8 1 host octeon106 >>> 5 1 osd.5 up 1 >>> -9 1 host octeon101 >>> 6 1 osd.6 up 1 >>> root@essperf3:/etc/ceph# ceph -s >>> cluster 868bfacc-e492-11e4-89fa-000fb711110c >>> health HEALTH_OK >>> monmap e1: 1 mons at {octeon109=209.243.160.70:6789/0}, election >> epoch 1, quorum 0 octeon109 >>> osdmap e80: 6 osds: 6 up, 6 in >>> pgmap v26765: 728 pgs, 2 pools, 20070 MB data, 15003 objects >>> 60604 MB used, 2734 GB / 2793 GB avail >>> 728 active+clean >>> root@essperf3:/etc/ceph# >>> >>> root@essperf3:/etc/ceph# salt octeon109 ceph.get_heartbeats >>> octeon109: >>> ---------- >>> - boot_time: >>> 1430784431 >>> - ceph_version: >>> 0.80.8-0.el6 >>> - services: >>> ---------- >>> ceph-mon.octeon109: >>> ---------- >>> cluster: >>> ceph >>> fsid: >>> 868bfacc-e492-11e4-89fa-000fb711110c >>> id: >>> octeon109 >>> status: >>> ---------- >>> election_epoch: >>> 1 >>> extra_probe_peers: >>> monmap: >>> ---------- >>> created: >>> 2015-04-16 23:50:52.412686 >>> epoch: >>> 1 >>> fsid: >>> 868bfacc-e492-11e4-89fa-000fb711110c >>> modified: >>> 2015-04-16 23:50:52.412686 >>> mons: >>> ---------- >>> - addr: >>> 209.243.160.70:6789/0 >>> - name: >>> octeon109 >>> - rank: >>> 0 >>> name: >>> octeon109 >>> outside_quorum: >>> quorum: >>> - 0 >>> rank: >>> 0 >>> state: >>> leader >>> sync_provider: >>> type: >>> mon >>> version: >>> 0.86 >>> ---------- >>> - 868bfacc-e492-11e4-89fa-000fb711110c: >>> ---------- >>> fsid: >>> 868bfacc-e492-11e4-89fa-000fb711110c >>> name: >>> ceph >>> versions: >>> ---------- >>> config: >>> 87f175c60e5c7ec06c263c556056fbcb >>> health: >>> a907d0ec395713369b4843381ec31bc2 >>> mds_map: >>> 1 >>> mon_map: >>> 1 >>> mon_status: >>> 1 >>> osd_map: >>> 80 >>> pg_summary: >>> 7e29d7cc93cfced8f3f146cc78f5682f >>> root@essperf3:/etc/ceph# >>> >>> >>> >>>> -----Original Message----- >>>> From: Gregory Meno [mailto:gmeno@xxxxxxxxxx] >>>> Sent: Tuesday, May 12, 2015 5:03 PM >>>> To: Bruce McFarland >>>> Cc: ceph-calamari@xxxxxxxxxxxxxx; ceph-users@xxxxxxxx; ceph-devel >>>> (ceph-devel@xxxxxxxxxxxxxxx) >>>> Subject: Re: [ceph-calamari] Does anyone understand Calamari?? >>>> >>>> Bruce, >>>> >>>> It is great to hear that salt is reporting status from all the nodes >>>> in the cluster. >>>> >>>> Let me see if I understand your question: >>>> >>>> You want to know what conditions cause us to recognize a working >> cluster? >>>> >>>> see >>>> >> https://github.com/ceph/calamari/blob/master/cthulhu/cthulhu/manager/ >>>> manager.py#L135 >>>> >>>> >> https://github.com/ceph/calamari/blob/master/cthulhu/cthulhu/manager/ >>>> manager.py#L349 >>>> >>>> and >>>> >>>> >> https://github.com/ceph/calamari/blob/master/cthulhu/cthulhu/manager/ >>>> c >>>> luster_monitor.py >>>> >>>> >>>> Let’s check that you need to be digging into that level of detail: >>>> >>>> You switched to a new instance of calamari and it is not recognizing >>>> the cluster. >>>> >>>> You what to know what you are overlooking? Would you please clarify >>>> with some hostnames? >>>> >>>> i.e. Let say that your old calamari node was called calamariA and >>>> that your new node is calamariB >>>> >>>> from which are you running the get_heartbeats? >>>> >>>> what is the master setting in the minion config files out on the >>>> nodes of the cluster if things are setup correctly they would look like this: >>>> >>>> [root@node1 shadow_man]# cat /etc/salt/minion.d/calamari.conf >>>> master: calamariB >>>> >>>> >>>> If this is the case the thing I would check is the >>>> http://calamariB/api/v2/cluster endpoint is reporting anything? >>>> >>>> hope this helps, >>>> Gregory >>>> >>>>> On May 12, 2015, at 4:34 PM, Bruce McFarland >>>> <Bruce.McFarland@xxxxxxxxxxxxxxxx> wrote: >>>>> >>>>> Increasing the audience since ceph-calamari is not responsive. What >>>>> salt >>>> event/info does the Calamari Master expect to see from the ceph-mon >>>> to determine there is an working cluster? I had to change servers >>>> hosting the calamari master and can’t get the new machine to recognize >> the cluster. >>>> The ‘salt \* ceph.get_heartbeats’ returns monmap, fsid, ver, epoch, >>>> etc for the monitor and all of the osd’s. Can anyone point me to docs >>>> or code that might enlighten me to what I’m overlooking? Thanks. >>>>> _______________________________________________ >>>>> ceph-calamari mailing list >>>>> ceph-calamari@xxxxxxxxxxxxxx >>>>> http://lists.ceph.com/listinfo.cgi/ceph-calamari-ceph.com >>> > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com