Re: [ceph-calamari] Does anyone understand Calamari??

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



All that looks fine.

There must be some state where the cluster is known to calamari and it is failing to actually show it.

If you have time to debug I would love to see the logs at debug level.

If you don’t we could try cleaning out calamari’s state.
sudo supervisorctl shutdown
sudo service httpd stop
sudo calamari-ctl clear —yes-i-am-sure
sudo calamari-ctl initialize

then 
sudo service supervisord start
sudo service httpd start

see what the API and UI says then.

regards,
Gregory 
> On May 12, 2015, at 5:18 PM, Bruce McFarland <Bruce.McFarland@xxxxxxxxxxxxxxxx> wrote:
> 
> Master was ess68 and now it's essperf3. 
> 
> On all cluster nodes the following files now have 'master: essperf3'
> /etc/salt/minion 
> /etc/salt/minion/calamari.conf 
> /etc/diamond/diamond.conf
> 
> The 'salt \* ceph.get_heartbeats' is being run on essperf3 - heres a 'salt \* test.ping' from essperf3 Calamari Master to the cluster. I've also included a quick cluster sanity test with the output of ceph -s and ceph osd tree. And for your reading pleasure the output of 'salt octeon109 ceph.get_heartbeats' since I suspect there might be a missing field in the monitor response. 
> 
> oot@essperf3:/etc/ceph# salt \* test.ping
> octeon108:
>    True
> octeon114:
>    True
> octeon111:
>    True
> octeon101:
>    True
> octeon106:
>    True
> octeon109:
>    True
> octeon118:
>    True
> root@essperf3:/etc/ceph# ceph osd tree
> # id	weight	type name	up/down	reweight
> -1	7	root default
> -4	1		host octeon108
> 0	1			osd.0	up	1	
> -2	1		host octeon111
> 1	1			osd.1	up	1	
> -5	1		host octeon115
> 2	1			osd.2	DNE		
> -6	1		host octeon118
> 3	1			osd.3	up	1	
> -7	1		host octeon114
> 4	1			osd.4	up	1	
> -8	1		host octeon106
> 5	1			osd.5	up	1	
> -9	1		host octeon101
> 6	1			osd.6	up	1	
> root@essperf3:/etc/ceph# ceph -s 
>    cluster 868bfacc-e492-11e4-89fa-000fb711110c
>     health HEALTH_OK
>     monmap e1: 1 mons at {octeon109=209.243.160.70:6789/0}, election epoch 1, quorum 0 octeon109
>     osdmap e80: 6 osds: 6 up, 6 in
>      pgmap v26765: 728 pgs, 2 pools, 20070 MB data, 15003 objects
>            60604 MB used, 2734 GB / 2793 GB avail
>                 728 active+clean
> root@essperf3:/etc/ceph#
> 
> root@essperf3:/etc/ceph# salt octeon109 ceph.get_heartbeats
> octeon109:
>    ----------
>    - boot_time:
>        1430784431
>    - ceph_version:
>        0.80.8-0.el6
>    - services:
>        ----------
>        ceph-mon.octeon109:
>            ----------
>            cluster:
>                ceph
>            fsid:
>                868bfacc-e492-11e4-89fa-000fb711110c
>            id:
>                octeon109
>            status:
>                ----------
>                election_epoch:
>                    1
>                extra_probe_peers:
>                monmap:
>                    ----------
>                    created:
>                        2015-04-16 23:50:52.412686
>                    epoch:
>                        1
>                    fsid:
>                        868bfacc-e492-11e4-89fa-000fb711110c
>                    modified:
>                        2015-04-16 23:50:52.412686
>                    mons:
>                        ----------
>                        - addr:
>                            209.243.160.70:6789/0
>                        - name:
>                            octeon109
>                        - rank:
>                            0
>                name:
>                    octeon109
>                outside_quorum:
>                quorum:
>                    - 0
>                rank:
>                    0
>                state:
>                    leader
>                sync_provider:
>            type:
>                mon
>            version:
>                0.86
>    ----------
>    - 868bfacc-e492-11e4-89fa-000fb711110c:
>        ----------
>        fsid:
>            868bfacc-e492-11e4-89fa-000fb711110c
>        name:
>            ceph
>        versions:
>            ----------
>            config:
>                87f175c60e5c7ec06c263c556056fbcb
>            health:
>                a907d0ec395713369b4843381ec31bc2
>            mds_map:
>                1
>            mon_map:
>                1
>            mon_status:
>                1
>            osd_map:
>                80
>            pg_summary:
>                7e29d7cc93cfced8f3f146cc78f5682f
> root@essperf3:/etc/ceph#
> 
> 
> 
>> -----Original Message-----
>> From: Gregory Meno [mailto:gmeno@xxxxxxxxxx]
>> Sent: Tuesday, May 12, 2015 5:03 PM
>> To: Bruce McFarland
>> Cc: ceph-calamari@xxxxxxxxxxxxxx; ceph-users@xxxxxxxx; ceph-devel
>> (ceph-devel@xxxxxxxxxxxxxxx)
>> Subject: Re: [ceph-calamari] Does anyone understand Calamari??
>> 
>> Bruce,
>> 
>> It is great to hear that salt is reporting status from all the nodes in the
>> cluster.
>> 
>> Let me see if I understand your question:
>> 
>> You want to know what conditions cause us to recognize a working cluster?
>> 
>> see
>> https://github.com/ceph/calamari/blob/master/cthulhu/cthulhu/manager/
>> manager.py#L135
>> 
>> https://github.com/ceph/calamari/blob/master/cthulhu/cthulhu/manager/
>> manager.py#L349
>> 
>> and
>> 
>> https://github.com/ceph/calamari/blob/master/cthulhu/cthulhu/manager/c
>> luster_monitor.py
>> 
>> 
>> Let’s check that you need to be digging into that level of detail:
>> 
>> You switched to a new instance of calamari and it is not recognizing the
>> cluster.
>> 
>> You what to know what you are overlooking? Would you please clarify with
>> some hostnames?
>> 
>> i.e. Let say that your old calamari node was called calamariA and that your
>> new node is calamariB
>> 
>> from which are you running the get_heartbeats?
>> 
>> what is the master setting in the minion config files out on the nodes of the
>> cluster if things are setup correctly they would look like this:
>> 
>> [root@node1 shadow_man]# cat /etc/salt/minion.d/calamari.conf
>> master: calamariB
>> 
>> 
>> If this is the case the thing I would check is the
>> http://calamariB/api/v2/cluster endpoint is reporting anything?
>> 
>> hope this helps,
>> Gregory
>> 
>>> On May 12, 2015, at 4:34 PM, Bruce McFarland
>> <Bruce.McFarland@xxxxxxxxxxxxxxxx> wrote:
>>> 
>>> Increasing the audience since ceph-calamari is not responsive. What salt
>> event/info does the Calamari Master expect to see from the ceph-mon to
>> determine there is an working cluster? I had to change servers hosting the
>> calamari master and can’t get the new machine to recognize the cluster.
>> The ‘salt \* ceph.get_heartbeats’ returns monmap, fsid, ver, epoch, etc for
>> the monitor and all of the osd’s. Can anyone point me to docs or code that
>> might enlighten me to what I’m overlooking? Thanks.
>>> _______________________________________________
>>> ceph-calamari mailing list
>>> ceph-calamari@xxxxxxxxxxxxxx
>>> http://lists.ceph.com/listinfo.cgi/ceph-calamari-ceph.com
> 

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux