Re: [ceph-calamari] Does anyone understand Calamari??

Bruce McFarland <Bruce.McFarland@xxxxxxxxxxxxxxxx> · Wed, 13 May 2015 01:11:01 +0000

Which logs? I'm assuming /var/log/salt/minon since the rest on the minions are relatively empty. Possibly Cthulhu from the master?

I'm running on Ubuntu 14.04 and don't have an httpd service. I had been start/stopping apache2. Likewise there is no supervisord service and I've been using supervisorctl to start/stop Cthulhu. 

I've performed the calamari-ctl clear/init sequence more than twice with also stopping/starting apache2 and Cthulhu.

> -----Original Message-----
> From: Gregory Meno [mailto:gmeno@xxxxxxxxxx]
> Sent: Tuesday, May 12, 2015 5:58 PM
> To: Bruce McFarland
> Cc: ceph-calamari@xxxxxxxxxxxxxx; ceph-users@xxxxxxxx; ceph-devel
> (ceph-devel@xxxxxxxxxxxxxxx)
> Subject: Re: [ceph-calamari] Does anyone understand Calamari??
> 
> All that looks fine.
> 
> There must be some state where the cluster is known to calamari and it is
> failing to actually show it.
> 
> If you have time to debug I would love to see the logs at debug level.
> 
> If you don’t we could try cleaning out calamari’s state.
> sudo supervisorctl shutdown
> sudo service httpd stop
> sudo calamari-ctl cl—yes-i-am-sure
> sudo calamari-ctl initialize
> ca
> then
> sudo service supervisord start
> sudo service httpd start
> 
> see what the API and UI says then.
> 
> regards,
> Gregory
> > On May 12, 2015, at 5:18 PM, Bruce McFarland
> <Bruce.McFarland@xxxxxxxxxxxxxxxx> wrote:
> >
> > Master was ess68 and now it's essperf3.
> >
> > On all cluster nodes the following files now have 'master: essperf3'
> > /etc/salt/minion
> > /etc/salt/minion/calamari.conf
> > /etc/diamond/diamond.conf
> >
> > The 'salt \* ceph.get_heartbeats' is being run on essperf3 - heres a 'salt \*
> test.ping' from essperf3 Calamari Master to the cluster. I've also included a
> quick cluster sanity test with the output of ceph -s and ceph osd tree. And for
> your reading pleasure the output of 'salt octeon109 ceph.get_heartbeats'
> since I suspect there might be a missing field in the monitor response.
> >
> > oot@essperf3:/etc/ceph# salt \* test.ping
> > octeon108:
> >    True
> > octeon114:
> >    True
> > octeon111:
> >    True
> > octeon101:
> >    True
> > octeon106:
> >    True
> > octeon109:
> >    True
> > octeon118:
> >    True
> > root@essperf3:/etc/ceph# ceph osd tree
> > # id	weight	type name	up/down	reweight
> > -1	7	root default
> > -4	1		host octeon108
> > 0	1			osd.0	up	1
> > -2	1		host octeon111
> > 1	1			osd.1	up	1
> > -5	1		host octeon115
> > 2	1			osd.2	DNE
> > -6	1		host octeon118
> > 3	1			osd.3	up	1
> > -7	1		host octeon114
> > 4	1			osd.4	up	1
> > -8	1		host octeon106
> > 5	1			osd.5	up	1
> > -9	1		host octeon101
> > 6	1			osd.6	up	1
> > root@essperf3:/etc/ceph# ceph -s
> >    cluster 868bfacc-e492-11e4-89fa-000fb711110c
> >     health HEALTH_OK
> >     monmap e1: 1 mons at {octeon109=209.243.160.70:6789/0}, election
> epoch 1, quorum 0 octeon109
> >     osdmap e80: 6 osds: 6 up, 6 in
> >      pgmap v26765: 728 pgs, 2 pools, 20070 MB data, 15003 objects
> >            60604 MB used, 2734 GB / 2793 GB avail
> >                 728 active+clean
> > root@essperf3:/etc/ceph#
> >
> > root@essperf3:/etc/ceph# salt octeon109 ceph.get_heartbeats
> > octeon109:
> >    ----------
> >    - boot_time:
> >        1430784431
> >    - ceph_version:
> >        0.80.8-0.el6
> >    - services:
> >        ----------
> >        ceph-mon.octeon109:
> >            ----------
> >            cluster:
> >                ceph
> >            fsid:
> >                868bfacc-e492-11e4-89fa-000fb711110c
> >            id:
> >                octeon109
> >            status:
> >                ----------
> >                election_epoch:
> >                    1
> >                extra_probe_peers:
> >                monmap:
> >                    ----------
> >                    created:
> >                        2015-04-16 23:50:52.412686
> >                    epoch:
> >                        1
> >                    fsid:
> >                        868bfacc-e492-11e4-89fa-000fb711110c
> >                    modified:
> >                        2015-04-16 23:50:52.412686
> >                    mons:
> >                        ----------
> >                        - addr:
> >                            209.243.160.70:6789/0
> >                        - name:
> >                            octeon109
> >                        - rank:
> >                            0
> >                name:
> >                    octeon109
> >                outside_quorum:
> >                quorum:
> >                    - 0
> >                rank:
> >                    0
> >                state:
> >                    leader
> >                sync_provider:
> >            type:
> >                mon
> >            version:
> >                0.86
> >    ----------
> >    - 868bfacc-e492-11e4-89fa-000fb711110c:
> >        ----------
> >        fsid:
> >            868bfacc-e492-11e4-89fa-000fb711110c
> >        name:
> >            ceph
> >        versions:
> >            ----------
> >            config:
> >                87f175c60e5c7ec06c263c556056fbcb
> >            health:
> >                a907d0ec395713369b4843381ec31bc2
> >            mds_map:
> >                1
> >            mon_map:
> >                1
> >            mon_status:
> >                1
> >            osd_map:
> >                80
> >            pg_summary:
> >                7e29d7cc93cfced8f3f146cc78f5682f
> > root@essperf3:/etc/ceph#
> >
> >
> >
> >> -----Original Message-----
> >> From: Gregory Meno [mailto:gmeno@xxxxxxxxxx]
> >> Sent: Tuesday, May 12, 2015 5:03 PM
> >> To: Bruce McFarland
> >> Cc: ceph-calamari@xxxxxxxxxxxxxx; ceph-users@xxxxxxxx; ceph-devel
> >> (ceph-devel@xxxxxxxxxxxxxxx)
> >> Subject: Re: [ceph-calamari] Does anyone understand Calamari??
> >>
> >> Bruce,
> >>
> >> It is great to hear that salt is reporting status from all the nodes
> >> in the cluster.
> >>
> >> Let me see if I understand your question:
> >>
> >> You want to know what conditions cause us to recognize a working
> cluster?
> >>
> >> see
> >>
> https://github.com/ceph/calamari/blob/master/cthulhu/cthulhu/manager/
> >> manager.py#L135
> >>
> >>
> https://github.com/ceph/calamari/blob/master/cthulhu/cthulhu/manager/
> >> manager.py#L349
> >>
> >> and
> >>
> >>
> https://github.com/ceph/calamari/blob/master/cthulhu/cthulhu/manager/
> >> c
> >> luster_monitor.py
> >>
> >>
> >> Let’s check that you need to be digging into that level of detail:
> >>
> >> You switched to a new instance of calamari and it is not recognizing
> >> the cluster.
> >>
> >> You what to know what you are overlooking? Would you please clarify
> >> with some hostnames?
> >>
> >> i.e. Let say that your old calamari node was called calamariA and
> >> that your new node is calamariB
> >>
> >> from which are you running the get_heartbeats?
> >>
> >> what is the master setting in the minion config files out on the
> >> nodes of the cluster if things are setup correctly they would look like this:
> >>
> >> [root@node1 shadow_man]# cat /etc/salt/minion.d/calamari.conf
> >> master: calamariB
> >>
> >>
> >> If this is the case the thing I would check is the
> >> http://calamariB/api/v2/cluster endpoint is reporting anything?
> >>
> >> hope this helps,
> >> Gregory
> >>
> >>> On May 12, 2015, at 4:34 PM, Bruce McFarland
> >> <Bruce.McFarland@xxxxxxxxxxxxxxxx> wrote:
> >>>
> >>> Increasing the audience since ceph-calamari is not responsive. What
> >>> salt
> >> event/info does the Calamari Master expect to see from the ceph-mon
> >> to determine there is an working cluster? I had to change servers
> >> hosting the calamari master and can’t get the new machine to recognize
> the cluster.
> >> The ‘salt \* ceph.get_heartbeats’ returns monmap, fsid, ver, epoch,
> >> etc for the monitor and all of the osd’s. Can anyone point me to docs
> >> or code that might enlighten me to what I’m overlooking? Thanks.
> >>> _______________________________________________
> >>> ceph-calamari mailing list
> >>> ceph-calamari@xxxxxxxxxxxxxx
> >>> http://lists.ceph.com/listinfo.cgi/ceph-calamari-ceph.com
> >

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com