Re: [ceph-calamari] Does anyone understand Calamari??

Gregory Meno <gmeno@xxxxxxxxxx> · Tue, 12 May 2015 17:58:13 -0700

All that looks fine.

There must be some state where the cluster is known to calamari and it is failing to actually show it.

If you have time to debug I would love to see the logs at debug level.

If you don’t we could try cleaning out calamari’s state.
sudo supervisorctl shutdown
sudo service httpd stop
sudo calamari-ctl clear —yes-i-am-sure
sudo calamari-ctl initialize

then 
sudo service supervisord start
sudo service httpd start

see what the API and UI says then.

regards,
Gregory 
> On May 12, 2015, at 5:18 PM, Bruce McFarland <Bruce.McFarland@xxxxxxxxxxxxxxxx> wrote:
> 
> Master was ess68 and now it's essperf3. 
> 
> On all cluster nodes the following files now have 'master: essperf3'
> /etc/salt/minion 
> /etc/salt/minion/calamari.conf 
> /etc/diamond/diamond.conf
> 
> The 'salt \* ceph.get_heartbeats' is being run on essperf3 - heres a 'salt \* test.ping' from essperf3 Calamari Master to the cluster. I've also included a quick cluster sanity test with the output of ceph -s and ceph osd tree. And for your reading pleasure the output of 'salt octeon109 ceph.get_heartbeats' since I suspect there might be a missing field in the monitor response. 
> 
> oot@essperf3:/etc/ceph# salt \* test.ping
> octeon108:
>    True
> octeon114:
>    True
> octeon111:
>    True
> octeon101:
>    True
> octeon106:
>    True
> octeon109:
>    True
> octeon118:
>    True
> root@essperf3:/etc/ceph# ceph osd tree
> # id	weight	type name	up/down	reweight
> -1	7	root default
> -4	1		host octeon108
> 0	1			osd.0	up	1	
> -2	1		host octeon111
> 1	1			osd.1	up	1	
> -5	1		host octeon115
> 2	1			osd.2	DNE		
> -6	1		host octeon118
> 3	1			osd.3	up	1	
> -7	1		host octeon114
> 4	1			osd.4	up	1	
> -8	1		host octeon106
> 5	1			osd.5	up	1	
> -9	1		host octeon101
> 6	1			osd.6	up	1	
> root@essperf3:/etc/ceph# ceph -s 
>    cluster 868bfacc-e492-11e4-89fa-000fb711110c
>     health HEALTH_OK
>     monmap e1: 1 mons at {octeon109=209.243.160.70:6789/0}, election epoch 1, quorum 0 octeon109
>     osdmap e80: 6 osds: 6 up, 6 in
>      pgmap v26765: 728 pgs, 2 pools, 20070 MB data, 15003 objects
>            60604 MB used, 2734 GB / 2793 GB avail
>                 728 active+clean
> root@essperf3:/etc/ceph#
> 
> root@essperf3:/etc/ceph# salt octeon109 ceph.get_heartbeats
> octeon109:
>    ----------
>    - boot_time:
>        1430784431
>    - ceph_version:
>        0.80.8-0.el6
>    - services:
>        ----------
>        ceph-mon.octeon109:
>            ----------
>            cluster:
>                ceph
>            fsid:
>                868bfacc-e492-11e4-89fa-000fb711110c
>            id:
>                octeon109
>            status:
>                ----------
>                election_epoch:
>                    1
>                extra_probe_peers:
>                monmap:
>                    ----------
>                    created:
>                        2015-04-16 23:50:52.412686
>                    epoch:
>                        1
>                    fsid:
>                        868bfacc-e492-11e4-89fa-000fb711110c
>                    modified:
>                        2015-04-16 23:50:52.412686
>                    mons:
>                        ----------
>                        - addr:
>                            209.243.160.70:6789/0
>                        - name:
>                            octeon109
>                        - rank:
>                            0
>                name:
>                    octeon109
>                outside_quorum:
>                quorum:
>                    - 0
>                rank:
>                    0
>                state:
>                    leader
>                sync_provider:
>            type:
>                mon
>            version:
>                0.86
>    ----------
>    - 868bfacc-e492-11e4-89fa-000fb711110c:
>        ----------
>        fsid:
>            868bfacc-e492-11e4-89fa-000fb711110c
>        name:
>            ceph
>        versions:
>            ----------
>            config:
>                87f175c60e5c7ec06c263c556056fbcb
>            health:
>                a907d0ec395713369b4843381ec31bc2
>            mds_map:
>                1
>            mon_map:
>                1
>            mon_status:
>                1
>            osd_map:
>                80
>            pg_summary:
>                7e29d7cc93cfced8f3f146cc78f5682f
> root@essperf3:/etc/ceph#
> 
> 
> 
>> -----Original Message-----
>> From: Gregory Meno [mailto:gmeno@xxxxxxxxxx]
>> Sent: Tuesday, May 12, 2015 5:03 PM
>> To: Bruce McFarland
>> Cc: ceph-calamari@xxxxxxxxxxxxxx; ceph-users@xxxxxxxx; ceph-devel
>> (ceph-devel@xxxxxxxxxxxxxxx)
>> Subject: Re: [ceph-calamari] Does anyone understand Calamari??
>> 
>> Bruce,
>> 
>> It is great to hear that salt is reporting status from all the nodes in the
>> cluster.
>> 
>> Let me see if I understand your question:
>> 
>> You want to know what conditions cause us to recognize a working cluster?
>> 
>> see
>> https://github.com/ceph/calamari/blob/master/cthulhu/cthulhu/manager/
>> manager.py#L135
>> 
>> https://github.com/ceph/calamari/blob/master/cthulhu/cthulhu/manager/
>> manager.py#L349
>> 
>> and
>> 
>> https://github.com/ceph/calamari/blob/master/cthulhu/cthulhu/manager/c
>> luster_monitor.py
>> 
>> 
>> Let’s check that you need to be digging into that level of detail:
>> 
>> You switched to a new instance of calamari and it is not recognizing the
>> cluster.
>> 
>> You what to know what you are overlooking? Would you please clarify with
>> some hostnames?
>> 
>> i.e. Let say that your old calamari node was called calamariA and that your
>> new node is calamariB
>> 
>> from which are you running the get_heartbeats?
>> 
>> what is the master setting in the minion config files out on the nodes of the
>> cluster if things are setup correctly they would look like this:
>> 
>> [root@node1 shadow_man]# cat /etc/salt/minion.d/calamari.conf
>> master: calamariB
>> 
>> 
>> If this is the case the thing I would check is the
>> http://calamariB/api/v2/cluster endpoint is reporting anything?
>> 
>> hope this helps,
>> Gregory
>> 
>>> On May 12, 2015, at 4:34 PM, Bruce McFarland
>> <Bruce.McFarland@xxxxxxxxxxxxxxxx> wrote:
>>> 
>>> Increasing the audience since ceph-calamari is not responsive. What salt
>> event/info does the Calamari Master expect to see from the ceph-mon to
>> determine there is an working cluster? I had to change servers hosting the
>> calamari master and can’t get the new machine to recognize the cluster.
>> The ‘salt \* ceph.get_heartbeats’ returns monmap, fsid, ver, epoch, etc for
>> the monitor and all of the osd’s. Can anyone point me to docs or code that
>> might enlighten me to what I’m overlooking? Thanks.
>>> _______________________________________________
>>> ceph-calamari mailing list
>>> ceph-calamari@xxxxxxxxxxxxxx
>>> http://lists.ceph.com/listinfo.cgi/ceph-calamari-ceph.com
> 

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com