/var/log/salt/minion doesn't really look very interesting after that sequence. I issues salt oceton109 ceph.get_heartbeats from the master. The logs are much more interesting when clear calamari and stop salt-minion. Looking at the endpoints from http://essperf2/api/v2/cluster doesn't show anything. It reports HTTP 200 OK and Vary: Accept but there is nothing in the body of the output ie no update_time, id, or name is being reported. root@octeon109:/var/log/salt# tail -f /var/log/salt/minion 2015-05-13 01:31:19,066 [salt.crypt ][DEBUG ][4699] Failed to authenticate message 2015-05-13 01:31:19,068 [salt.minion ][DEBUG ][4699] Attempting to authenticate with the Salt Master at 209.243.160.35 2015-05-13 01:31:19,069 [salt.crypt ][DEBUG ][4699] Re-using SAuth for ('/etc/salt/pki/minion', 'octeon109', 'tcp://209.243.160.35:4506') 2015-05-13 01:31:19,294 [salt.crypt ][DEBUG ][4699] Decrypting the current master AES key 2015-05-13 01:31:19,296 [salt.crypt ][DEBUG ][4699] Loaded minion key: /etc/salt/pki/minion/minion.pem 2015-05-13 01:31:20,026 [salt.crypt ][DEBUG ][4699] Loaded minion key: /etc/salt/pki/minion/minion.pem 2015-05-13 01:33:04,027 [salt.minion ][INFO ][4699] User root Executing command ceph.get_heartbeats with jid 20150512183304482562 2015-05-13 01:33:04,028 [salt.minion ][DEBUG ][4699] Command details {'tgt_type': 'glob', 'jid': '20150512183304482562', 'tgt': 'octeon109', 'ret': '', 'user': 'root', 'arg': [], 'fun': 'ceph.get_heartbeats'} 2015-05-13 01:33:04,043 [salt.minion ][INFO ][5912] Starting a new job with PID 5912 2015-05-13 01:33:04,053 [salt.utils.lazy ][DEBUG ][5912] LazyLoaded ceph.get_heartbeats 2015-05-13 01:33:04,209 [salt.utils.lazy ][DEBUG ][5912] LazyLoaded pkg.version 2015-05-13 01:33:04,212 [salt.utils.lazy ][DEBUG ][5912] LazyLoaded pkg_resource.version 2015-05-13 01:33:04,217 [salt.utils.lazy ][DEBUG ][5912] LazyLoaded cmd.run_stdout 2015-05-13 01:33:04,219 [salt.loaded.int.module.cmdmod ][INFO ][5912] Executing command ['dpkg-query', '--showformat', '${Status} ${Package} ${Version} ${Architecture}\n', '-W'] in directory '/root' 2015-05-13 01:33:05,432 [salt.minion ][INFO ][5912] Returning information for job: 20150512183304482562 2015-05-13 01:33:05,434 [salt.crypt ][DEBUG ][5912] Re-using SAuth for ('/etc/salt/pki/minion', 'octeon109', 'tcp://209.243.160.35:4506') > -----Original Message----- > From: Bruce McFarland > Sent: Tuesday, May 12, 2015 6:11 PM > To: 'Gregory Meno' > Cc: ceph-calamari@xxxxxxxxxxxxxx; ceph-users@xxxxxxxx; ceph-devel > (ceph-devel@xxxxxxxxxxxxxxx) > Subject: RE: [ceph-calamari] Does anyone understand Calamari?? > > Which logs? I'm assuming /var/log/salt/minon since the rest on the minions > are relatively empty. Possibly Cthulhu from the master? > > I'm running on Ubuntu 14.04 and don't have an httpd service. I had been > start/stopping apache2. Likewise there is no supervisord service and I've > been using supervisorctl to start/stop Cthulhu. > > I've performed the calamari-ctl clear/init sequence more than twice with > also stopping/starting apache2 and Cthulhu. > > > -----Original Message----- > > From: Gregory Meno [mailto:gmeno@xxxxxxxxxx] > > Sent: Tuesday, May 12, 2015 5:58 PM > > To: Bruce McFarland > > Cc: ceph-calamari@xxxxxxxxxxxxxx; ceph-users@xxxxxxxx; ceph-devel > > (ceph-devel@xxxxxxxxxxxxxxx) > > Subject: Re: [ceph-calamari] Does anyone understand Calamari?? > > > > All that looks fine. > > > > There must be some state where the cluster is known to calamari and it > > is failing to actually show it. > > > > If you have time to debug I would love to see the logs at debug level. > > > > If you don’t we could try cleaning out calamari’s state. > > sudo supervisorctl shutdown > > sudo service httpd stop > > sudo calamari-ctl cl—yes-i-am-sure > > sudo calamari-ctl initialize > > ca > > then > > sudo service supervisord start > > sudo service httpd start > > > > see what the API and UI says then. > > > > regards, > > Gregory > > > On May 12, 2015, at 5:18 PM, Bruce McFarland > > <Bruce.McFarland@xxxxxxxxxxxxxxxx> wrote: > > > > > > Master was ess68 and now it's essperf3. > > > > > > On all cluster nodes the following files now have 'master: essperf3' > > > /etc/salt/minion > > > /etc/salt/minion/calamari.conf > > > /etc/diamond/diamond.conf > > > > > > The 'salt \* ceph.get_heartbeats' is being run on essperf3 - heres a > > > 'salt \* > > test.ping' from essperf3 Calamari Master to the cluster. I've also > > included a quick cluster sanity test with the output of ceph -s and > > ceph osd tree. And for your reading pleasure the output of 'salt octeon109 > ceph.get_heartbeats' > > since I suspect there might be a missing field in the monitor response. > > > > > > oot@essperf3:/etc/ceph# salt \* test.ping > > > octeon108: > > > True > > > octeon114: > > > True > > > octeon111: > > > True > > > octeon101: > > > True > > > octeon106: > > > True > > > octeon109: > > > True > > > octeon118: > > > True > > > root@essperf3:/etc/ceph# ceph osd tree > > > # id weight type name up/down reweight > > > -1 7 root default > > > -4 1 host octeon108 > > > 0 1 osd.0 up 1 > > > -2 1 host octeon111 > > > 1 1 osd.1 up 1 > > > -5 1 host octeon115 > > > 2 1 osd.2 DNE > > > -6 1 host octeon118 > > > 3 1 osd.3 up 1 > > > -7 1 host octeon114 > > > 4 1 osd.4 up 1 > > > -8 1 host octeon106 > > > 5 1 osd.5 up 1 > > > -9 1 host octeon101 > > > 6 1 osd.6 up 1 > > > root@essperf3:/etc/ceph# ceph -s > > > cluster 868bfacc-e492-11e4-89fa-000fb711110c > > > health HEALTH_OK > > > monmap e1: 1 mons at {octeon109=209.243.160.70:6789/0}, election > > epoch 1, quorum 0 octeon109 > > > osdmap e80: 6 osds: 6 up, 6 in > > > pgmap v26765: 728 pgs, 2 pools, 20070 MB data, 15003 objects > > > 60604 MB used, 2734 GB / 2793 GB avail > > > 728 active+clean > > > root@essperf3:/etc/ceph# > > > > > > root@essperf3:/etc/ceph# salt octeon109 ceph.get_heartbeats > > > octeon109: > > > ---------- > > > - boot_time: > > > 1430784431 > > > - ceph_version: > > > 0.80.8-0.el6 > > > - services: > > > ---------- > > > ceph-mon.octeon109: > > > ---------- > > > cluster: > > > ceph > > > fsid: > > > 868bfacc-e492-11e4-89fa-000fb711110c > > > id: > > > octeon109 > > > status: > > > ---------- > > > election_epoch: > > > 1 > > > extra_probe_peers: > > > monmap: > > > ---------- > > > created: > > > 2015-04-16 23:50:52.412686 > > > epoch: > > > 1 > > > fsid: > > > 868bfacc-e492-11e4-89fa-000fb711110c > > > modified: > > > 2015-04-16 23:50:52.412686 > > > mons: > > > ---------- > > > - addr: > > > 209.243.160.70:6789/0 > > > - name: > > > octeon109 > > > - rank: > > > 0 > > > name: > > > octeon109 > > > outside_quorum: > > > quorum: > > > - 0 > > > rank: > > > 0 > > > state: > > > leader > > > sync_provider: > > > type: > > > mon > > > version: > > > 0.86 > > > ---------- > > > - 868bfacc-e492-11e4-89fa-000fb711110c: > > > ---------- > > > fsid: > > > 868bfacc-e492-11e4-89fa-000fb711110c > > > name: > > > ceph > > > versions: > > > ---------- > > > config: > > > 87f175c60e5c7ec06c263c556056fbcb > > > health: > > > a907d0ec395713369b4843381ec31bc2 > > > mds_map: > > > 1 > > > mon_map: > > > 1 > > > mon_status: > > > 1 > > > osd_map: > > > 80 > > > pg_summary: > > > 7e29d7cc93cfced8f3f146cc78f5682f > > > root@essperf3:/etc/ceph# > > > > > > > > > > > >> -----Original Message----- > > >> From: Gregory Meno [mailto:gmeno@xxxxxxxxxx] > > >> Sent: Tuesday, May 12, 2015 5:03 PM > > >> To: Bruce McFarland > > >> Cc: ceph-calamari@xxxxxxxxxxxxxx; ceph-users@xxxxxxxx; ceph-devel > > >> (ceph-devel@xxxxxxxxxxxxxxx) > > >> Subject: Re: [ceph-calamari] Does anyone understand Calamari?? > > >> > > >> Bruce, > > >> > > >> It is great to hear that salt is reporting status from all the > > >> nodes in the cluster. > > >> > > >> Let me see if I understand your question: > > >> > > >> You want to know what conditions cause us to recognize a working > > cluster? > > >> > > >> see > > >> > > > https://github.com/ceph/calamari/blob/master/cthulhu/cthulhu/manager/ > > >> manager.py#L135 > > >> > > >> > > > https://github.com/ceph/calamari/blob/master/cthulhu/cthulhu/manager/ > > >> manager.py#L349 > > >> > > >> and > > >> > > >> > > > https://github.com/ceph/calamari/blob/master/cthulhu/cthulhu/manager/ > > >> c > > >> luster_monitor.py > > >> > > >> > > >> Let’s check that you need to be digging into that level of detail: > > >> > > >> You switched to a new instance of calamari and it is not > > >> recognizing the cluster. > > >> > > >> You what to know what you are overlooking? Would you please clarify > > >> with some hostnames? > > >> > > >> i.e. Let say that your old calamari node was called calamariA and > > >> that your new node is calamariB > > >> > > >> from which are you running the get_heartbeats? > > >> > > >> what is the master setting in the minion config files out on the > > >> nodes of the cluster if things are setup correctly they would look like > this: > > >> > > >> [root@node1 shadow_man]# cat /etc/salt/minion.d/calamari.conf > > >> master: calamariB > > >> > > >> > > >> If this is the case the thing I would check is the > > >> http://calamariB/api/v2/cluster endpoint is reporting anything? > > >> > > >> hope this helps, > > >> Gregory > > >> > > >>> On May 12, 2015, at 4:34 PM, Bruce McFarland > > >> <Bruce.McFarland@xxxxxxxxxxxxxxxx> wrote: > > >>> > > >>> Increasing the audience since ceph-calamari is not responsive. > > >>> What salt > > >> event/info does the Calamari Master expect to see from the ceph-mon > > >> to determine there is an working cluster? I had to change servers > > >> hosting the calamari master and can’t get the new machine to > > >> recognize > > the cluster. > > >> The ‘salt \* ceph.get_heartbeats’ returns monmap, fsid, ver, epoch, > > >> etc for the monitor and all of the osd’s. Can anyone point me to > > >> docs or code that might enlighten me to what I’m overlooking? Thanks. > > >>> _______________________________________________ > > >>> ceph-calamari mailing list > > >>> ceph-calamari@xxxxxxxxxxxxxx > > >>> http://lists.ceph.com/listinfo.cgi/ceph-calamari-ceph.com > > > ��.n��������+%������w��{.n����z��u���ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f