All that looks fine. There must be some state where the cluster is known to calamari and it is failing to actually show it. If you have time to debug I would love to see the logs at debug level. If you don’t we could try cleaning out calamari’s state. sudo supervisorctl shutdown sudo service httpd stop sudo calamari-ctl clear —yes-i-am-sure sudo calamari-ctl initialize then sudo service supervisord start sudo service httpd start see what the API and UI says then. regards, Gregory > On May 12, 2015, at 5:18 PM, Bruce McFarland <Bruce.McFarland@xxxxxxxxxxxxxxxx> wrote: > > Master was ess68 and now it's essperf3. > > On all cluster nodes the following files now have 'master: essperf3' > /etc/salt/minion > /etc/salt/minion/calamari.conf > /etc/diamond/diamond.conf > > The 'salt \* ceph.get_heartbeats' is being run on essperf3 - heres a 'salt \* test.ping' from essperf3 Calamari Master to the cluster. I've also included a quick cluster sanity test with the output of ceph -s and ceph osd tree. And for your reading pleasure the output of 'salt octeon109 ceph.get_heartbeats' since I suspect there might be a missing field in the monitor response. > > oot@essperf3:/etc/ceph# salt \* test.ping > octeon108: > True > octeon114: > True > octeon111: > True > octeon101: > True > octeon106: > True > octeon109: > True > octeon118: > True > root@essperf3:/etc/ceph# ceph osd tree > # id weight type name up/down reweight > -1 7 root default > -4 1 host octeon108 > 0 1 osd.0 up 1 > -2 1 host octeon111 > 1 1 osd.1 up 1 > -5 1 host octeon115 > 2 1 osd.2 DNE > -6 1 host octeon118 > 3 1 osd.3 up 1 > -7 1 host octeon114 > 4 1 osd.4 up 1 > -8 1 host octeon106 > 5 1 osd.5 up 1 > -9 1 host octeon101 > 6 1 osd.6 up 1 > root@essperf3:/etc/ceph# ceph -s > cluster 868bfacc-e492-11e4-89fa-000fb711110c > health HEALTH_OK > monmap e1: 1 mons at {octeon109=209.243.160.70:6789/0}, election epoch 1, quorum 0 octeon109 > osdmap e80: 6 osds: 6 up, 6 in > pgmap v26765: 728 pgs, 2 pools, 20070 MB data, 15003 objects > 60604 MB used, 2734 GB / 2793 GB avail > 728 active+clean > root@essperf3:/etc/ceph# > > root@essperf3:/etc/ceph# salt octeon109 ceph.get_heartbeats > octeon109: > ---------- > - boot_time: > 1430784431 > - ceph_version: > 0.80.8-0.el6 > - services: > ---------- > ceph-mon.octeon109: > ---------- > cluster: > ceph > fsid: > 868bfacc-e492-11e4-89fa-000fb711110c > id: > octeon109 > status: > ---------- > election_epoch: > 1 > extra_probe_peers: > monmap: > ---------- > created: > 2015-04-16 23:50:52.412686 > epoch: > 1 > fsid: > 868bfacc-e492-11e4-89fa-000fb711110c > modified: > 2015-04-16 23:50:52.412686 > mons: > ---------- > - addr: > 209.243.160.70:6789/0 > - name: > octeon109 > - rank: > 0 > name: > octeon109 > outside_quorum: > quorum: > - 0 > rank: > 0 > state: > leader > sync_provider: > type: > mon > version: > 0.86 > ---------- > - 868bfacc-e492-11e4-89fa-000fb711110c: > ---------- > fsid: > 868bfacc-e492-11e4-89fa-000fb711110c > name: > ceph > versions: > ---------- > config: > 87f175c60e5c7ec06c263c556056fbcb > health: > a907d0ec395713369b4843381ec31bc2 > mds_map: > 1 > mon_map: > 1 > mon_status: > 1 > osd_map: > 80 > pg_summary: > 7e29d7cc93cfced8f3f146cc78f5682f > root@essperf3:/etc/ceph# > > > >> -----Original Message----- >> From: Gregory Meno [mailto:gmeno@xxxxxxxxxx] >> Sent: Tuesday, May 12, 2015 5:03 PM >> To: Bruce McFarland >> Cc: ceph-calamari@xxxxxxxxxxxxxx; ceph-users@xxxxxxxx; ceph-devel >> (ceph-devel@xxxxxxxxxxxxxxx) >> Subject: Re: [ceph-calamari] Does anyone understand Calamari?? >> >> Bruce, >> >> It is great to hear that salt is reporting status from all the nodes in the >> cluster. >> >> Let me see if I understand your question: >> >> You want to know what conditions cause us to recognize a working cluster? >> >> see >> https://github.com/ceph/calamari/blob/master/cthulhu/cthulhu/manager/ >> manager.py#L135 >> >> https://github.com/ceph/calamari/blob/master/cthulhu/cthulhu/manager/ >> manager.py#L349 >> >> and >> >> https://github.com/ceph/calamari/blob/master/cthulhu/cthulhu/manager/c >> luster_monitor.py >> >> >> Let’s check that you need to be digging into that level of detail: >> >> You switched to a new instance of calamari and it is not recognizing the >> cluster. >> >> You what to know what you are overlooking? Would you please clarify with >> some hostnames? >> >> i.e. Let say that your old calamari node was called calamariA and that your >> new node is calamariB >> >> from which are you running the get_heartbeats? >> >> what is the master setting in the minion config files out on the nodes of the >> cluster if things are setup correctly they would look like this: >> >> [root@node1 shadow_man]# cat /etc/salt/minion.d/calamari.conf >> master: calamariB >> >> >> If this is the case the thing I would check is the >> http://calamariB/api/v2/cluster endpoint is reporting anything? >> >> hope this helps, >> Gregory >> >>> On May 12, 2015, at 4:34 PM, Bruce McFarland >> <Bruce.McFarland@xxxxxxxxxxxxxxxx> wrote: >>> >>> Increasing the audience since ceph-calamari is not responsive. What salt >> event/info does the Calamari Master expect to see from the ceph-mon to >> determine there is an working cluster? I had to change servers hosting the >> calamari master and can’t get the new machine to recognize the cluster. >> The ‘salt \* ceph.get_heartbeats’ returns monmap, fsid, ver, epoch, etc for >> the monitor and all of the osd’s. Can anyone point me to docs or code that >> might enlighten me to what I’m overlooking? Thanks. >>> _______________________________________________ >>> ceph-calamari mailing list >>> ceph-calamari@xxxxxxxxxxxxxx >>> http://lists.ceph.com/listinfo.cgi/ceph-calamari-ceph.com > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com