Re: [ceph-calamari] Does anyone understand Calamari??

Bruce McFarland <Bruce.McFarland@xxxxxxxxxxxxxxxx> · Wed, 13 May 2015 01:39:29 +0000

/var/log/salt/minion doesn't really look very interesting after that sequence. I issues salt oceton109 ceph.get_heartbeats from the master. The logs are much more interesting when clear calamari and stop salt-minion. Looking at the endpoints from http://essperf2/api/v2/cluster doesn't show anything. It reports HTTP 200 OK and Vary: Accept but there is nothing in the body of the output ie no update_time, id, or name is being reported.

root@octeon109:/var/log/salt# tail -f /var/log/salt/minion
2015-05-13 01:31:19,066 [salt.crypt                               ][DEBUG   ][4699] Failed to authenticate message
2015-05-13 01:31:19,068 [salt.minion                              ][DEBUG   ][4699] Attempting to authenticate with the Salt Master at 209.243.160.35
2015-05-13 01:31:19,069 [salt.crypt                               ][DEBUG   ][4699] Re-using SAuth for ('/etc/salt/pki/minion', 'octeon109', 'tcp://209.243.160.35:4506')
2015-05-13 01:31:19,294 [salt.crypt                               ][DEBUG   ][4699] Decrypting the current master AES key
2015-05-13 01:31:19,296 [salt.crypt                               ][DEBUG   ][4699] Loaded minion key: /etc/salt/pki/minion/minion.pem
2015-05-13 01:31:20,026 [salt.crypt                               ][DEBUG   ][4699] Loaded minion key: /etc/salt/pki/minion/minion.pem
2015-05-13 01:33:04,027 [salt.minion                              ][INFO    ][4699] User root Executing command ceph.get_heartbeats with jid 20150512183304482562
2015-05-13 01:33:04,028 [salt.minion                              ][DEBUG   ][4699] Command details {'tgt_type': 'glob', 'jid': '20150512183304482562', 'tgt': 'octeon109', 'ret': '', 'user': 'root', 'arg': [], 'fun': 'ceph.get_heartbeats'}
2015-05-13 01:33:04,043 [salt.minion                              ][INFO    ][5912] Starting a new job with PID 5912
2015-05-13 01:33:04,053 [salt.utils.lazy                          ][DEBUG   ][5912] LazyLoaded ceph.get_heartbeats
2015-05-13 01:33:04,209 [salt.utils.lazy                          ][DEBUG   ][5912] LazyLoaded pkg.version
2015-05-13 01:33:04,212 [salt.utils.lazy                          ][DEBUG   ][5912] LazyLoaded pkg_resource.version
2015-05-13 01:33:04,217 [salt.utils.lazy                          ][DEBUG   ][5912] LazyLoaded cmd.run_stdout
2015-05-13 01:33:04,219 [salt.loaded.int.module.cmdmod            ][INFO    ][5912] Executing command ['dpkg-query', '--showformat', '${Status} ${Package} ${Version} ${Architecture}\n', '-W'] in directory '/root'
2015-05-13 01:33:05,432 [salt.minion                              ][INFO    ][5912] Returning information for job: 20150512183304482562
2015-05-13 01:33:05,434 [salt.crypt                               ][DEBUG   ][5912] Re-using SAuth for ('/etc/salt/pki/minion', 'octeon109', 'tcp://209.243.160.35:4506')

> -----Original Message-----
> From: Bruce McFarland
> Sent: Tuesday, May 12, 2015 6:11 PM
> To: 'Gregory Meno'
> Cc: ceph-calamari@xxxxxxxxxxxxxx; ceph-users@xxxxxxxx; ceph-devel
> (ceph-devel@xxxxxxxxxxxxxxx)
> Subject: RE: [ceph-calamari] Does anyone understand Calamari??
> 
> Which logs? I'm assuming /var/log/salt/minon since the rest on the minions
> are relatively empty. Possibly Cthulhu from the master?
> 
> I'm running on Ubuntu 14.04 and don't have an httpd service. I had been
> start/stopping apache2. Likewise there is no supervisord service and I've
> been using supervisorctl to start/stop Cthulhu.
> 
> I've performed the calamari-ctl clear/init sequence more than twice with
> also stopping/starting apache2 and Cthulhu.
> 
> > -----Original Message-----
> > From: Gregory Meno [mailto:gmeno@xxxxxxxxxx]
> > Sent: Tuesday, May 12, 2015 5:58 PM
> > To: Bruce McFarland
> > Cc: ceph-calamari@xxxxxxxxxxxxxx; ceph-users@xxxxxxxx; ceph-devel
> > (ceph-devel@xxxxxxxxxxxxxxx)
> > Subject: Re: [ceph-calamari] Does anyone understand Calamari??
> >
> > All that looks fine.
> >
> > There must be some state where the cluster is known to calamari and it
> > is failing to actually show it.
> >
> > If you have time to debug I would love to see the logs at debug level.
> >
> > If you don’t we could try cleaning out calamari’s state.
> > sudo supervisorctl shutdown
> > sudo service httpd stop
> > sudo calamari-ctl cl—yes-i-am-sure
> > sudo calamari-ctl initialize
> > ca
> > then
> > sudo service supervisord start
> > sudo service httpd start
> >
> > see what the API and UI says then.
> >
> > regards,
> > Gregory
> > > On May 12, 2015, at 5:18 PM, Bruce McFarland
> > <Bruce.McFarland@xxxxxxxxxxxxxxxx> wrote:
> > >
> > > Master was ess68 and now it's essperf3.
> > >
> > > On all cluster nodes the following files now have 'master: essperf3'
> > > /etc/salt/minion
> > > /etc/salt/minion/calamari.conf
> > > /etc/diamond/diamond.conf
> > >
> > > The 'salt \* ceph.get_heartbeats' is being run on essperf3 - heres a
> > > 'salt \*
> > test.ping' from essperf3 Calamari Master to the cluster. I've also
> > included a quick cluster sanity test with the output of ceph -s and
> > ceph osd tree. And for your reading pleasure the output of 'salt octeon109
> ceph.get_heartbeats'
> > since I suspect there might be a missing field in the monitor response.
> > >
> > > oot@essperf3:/etc/ceph# salt \* test.ping
> > > octeon108:
> > >    True
> > > octeon114:
> > >    True
> > > octeon111:
> > >    True
> > > octeon101:
> > >    True
> > > octeon106:
> > >    True
> > > octeon109:
> > >    True
> > > octeon118:
> > >    True
> > > root@essperf3:/etc/ceph# ceph osd tree
> > > # id	weight	type name	up/down	reweight
> > > -1	7	root default
> > > -4	1		host octeon108
> > > 0	1			osd.0	up	1
> > > -2	1		host octeon111
> > > 1	1			osd.1	up	1
> > > -5	1		host octeon115
> > > 2	1			osd.2	DNE
> > > -6	1		host octeon118
> > > 3	1			osd.3	up	1
> > > -7	1		host octeon114
> > > 4	1			osd.4	up	1
> > > -8	1		host octeon106
> > > 5	1			osd.5	up	1
> > > -9	1		host octeon101
> > > 6	1			osd.6	up	1
> > > root@essperf3:/etc/ceph# ceph -s
> > >    cluster 868bfacc-e492-11e4-89fa-000fb711110c
> > >     health HEALTH_OK
> > >     monmap e1: 1 mons at {octeon109=209.243.160.70:6789/0}, election
> > epoch 1, quorum 0 octeon109
> > >     osdmap e80: 6 osds: 6 up, 6 in
> > >      pgmap v26765: 728 pgs, 2 pools, 20070 MB data, 15003 objects
> > >            60604 MB used, 2734 GB / 2793 GB avail
> > >                 728 active+clean
> > > root@essperf3:/etc/ceph#
> > >
> > > root@essperf3:/etc/ceph# salt octeon109 ceph.get_heartbeats
> > > octeon109:
> > >    ----------
> > >    - boot_time:
> > >        1430784431
> > >    - ceph_version:
> > >        0.80.8-0.el6
> > >    - services:
> > >        ----------
> > >        ceph-mon.octeon109:
> > >            ----------
> > >            cluster:
> > >                ceph
> > >            fsid:
> > >                868bfacc-e492-11e4-89fa-000fb711110c
> > >            id:
> > >                octeon109
> > >            status:
> > >                ----------
> > >                election_epoch:
> > >                    1
> > >                extra_probe_peers:
> > >                monmap:
> > >                    ----------
> > >                    created:
> > >                        2015-04-16 23:50:52.412686
> > >                    epoch:
> > >                        1
> > >                    fsid:
> > >                        868bfacc-e492-11e4-89fa-000fb711110c
> > >                    modified:
> > >                        2015-04-16 23:50:52.412686
> > >                    mons:
> > >                        ----------
> > >                        - addr:
> > >                            209.243.160.70:6789/0
> > >                        - name:
> > >                            octeon109
> > >                        - rank:
> > >                            0
> > >                name:
> > >                    octeon109
> > >                outside_quorum:
> > >                quorum:
> > >                    - 0
> > >                rank:
> > >                    0
> > >                state:
> > >                    leader
> > >                sync_provider:
> > >            type:
> > >                mon
> > >            version:
> > >                0.86
> > >    ----------
> > >    - 868bfacc-e492-11e4-89fa-000fb711110c:
> > >        ----------
> > >        fsid:
> > >            868bfacc-e492-11e4-89fa-000fb711110c
> > >        name:
> > >            ceph
> > >        versions:
> > >            ----------
> > >            config:
> > >                87f175c60e5c7ec06c263c556056fbcb
> > >            health:
> > >                a907d0ec395713369b4843381ec31bc2
> > >            mds_map:
> > >                1
> > >            mon_map:
> > >                1
> > >            mon_status:
> > >                1
> > >            osd_map:
> > >                80
> > >            pg_summary:
> > >                7e29d7cc93cfced8f3f146cc78f5682f
> > > root@essperf3:/etc/ceph#
> > >
> > >
> > >
> > >> -----Original Message-----
> > >> From: Gregory Meno [mailto:gmeno@xxxxxxxxxx]
> > >> Sent: Tuesday, May 12, 2015 5:03 PM
> > >> To: Bruce McFarland
> > >> Cc: ceph-calamari@xxxxxxxxxxxxxx; ceph-users@xxxxxxxx; ceph-devel
> > >> (ceph-devel@xxxxxxxxxxxxxxx)
> > >> Subject: Re: [ceph-calamari] Does anyone understand Calamari??
> > >>
> > >> Bruce,
> > >>
> > >> It is great to hear that salt is reporting status from all the
> > >> nodes in the cluster.
> > >>
> > >> Let me see if I understand your question:
> > >>
> > >> You want to know what conditions cause us to recognize a working
> > cluster?
> > >>
> > >> see
> > >>
> >
> https://github.com/ceph/calamari/blob/master/cthulhu/cthulhu/manager/
> > >> manager.py#L135
> > >>
> > >>
> >
> https://github.com/ceph/calamari/blob/master/cthulhu/cthulhu/manager/
> > >> manager.py#L349
> > >>
> > >> and
> > >>
> > >>
> >
> https://github.com/ceph/calamari/blob/master/cthulhu/cthulhu/manager/
> > >> c
> > >> luster_monitor.py
> > >>
> > >>
> > >> Let’s check that you need to be digging into that level of detail:
> > >>
> > >> You switched to a new instance of calamari and it is not
> > >> recognizing the cluster.
> > >>
> > >> You what to know what you are overlooking? Would you please clarify
> > >> with some hostnames?
> > >>
> > >> i.e. Let say that your old calamari node was called calamariA and
> > >> that your new node is calamariB
> > >>
> > >> from which are you running the get_heartbeats?
> > >>
> > >> what is the master setting in the minion config files out on the
> > >> nodes of the cluster if things are setup correctly they would look like
> this:
> > >>
> > >> [root@node1 shadow_man]# cat /etc/salt/minion.d/calamari.conf
> > >> master: calamariB
> > >>
> > >>
> > >> If this is the case the thing I would check is the
> > >> http://calamariB/api/v2/cluster endpoint is reporting anything?
> > >>
> > >> hope this helps,
> > >> Gregory
> > >>
> > >>> On May 12, 2015, at 4:34 PM, Bruce McFarland
> > >> <Bruce.McFarland@xxxxxxxxxxxxxxxx> wrote:
> > >>>
> > >>> Increasing the audience since ceph-calamari is not responsive.
> > >>> What salt
> > >> event/info does the Calamari Master expect to see from the ceph-mon
> > >> to determine there is an working cluster? I had to change servers
> > >> hosting the calamari master and can’t get the new machine to
> > >> recognize
> > the cluster.
> > >> The ‘salt \* ceph.get_heartbeats’ returns monmap, fsid, ver, epoch,
> > >> etc for the monitor and all of the osd’s. Can anyone point me to
> > >> docs or code that might enlighten me to what I’m overlooking? Thanks.
> > >>> _______________________________________________
> > >>> ceph-calamari mailing list
> > >>> ceph-calamari@xxxxxxxxxxxxxx
> > >>> http://lists.ceph.com/listinfo.cgi/ceph-calamari-ceph.com
> > >

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com