Re: [ceph-calamari] Does anyone understand Calamari??

Gregory Meno <gmeno@xxxxxxxxxx> · Tue, 12 May 2015 19:08:07 -0700

Ideally I would like everything in /var/log/calmari

be sure to set calamari.conf like so:
[shadow_man@vpm107 ~]$ grep DEBUG /etc/calamari/calamari.conf 
log_level = DEBUG
db_log_level = DEBUG
log_level = DEBUG

then restart cthulhu and apache

visit http://essperf3/api/v2/cluster
and http://essperf3

and then share the logs here. Hopefully something obvious will be off in either calamari or cthulhu log

regards,
Gregory

> On May 12, 2015, at 6:11 PM, Bruce McFarland <Bruce.McFarland@xxxxxxxxxxxxxxxx> wrote:
> 
> Which logs? I'm assuming /var/log/salt/minon since the rest on the minions are relatively empty. Possibly Cthulhu from the master?
> 
> I'm running on Ubuntu 14.04 and don't have an httpd service. I had been start/stopping apache2. Likewise there is no supervisord service and I've been using supervisorctl to start/stop Cthulhu. 
> 
> I've performed the calamari-ctl clear/init sequence more than twice with also stopping/starting apache2 and Cthulhu.
> 
>> -----Original Message-----
>> From: Gregory Meno [mailto:gmeno@xxxxxxxxxx]
>> Sent: Tuesday, May 12, 2015 5:58 PM
>> To: Bruce McFarland
>> Cc: ceph-calamari@xxxxxxxxxxxxxx; ceph-users@xxxxxxxx; ceph-devel
>> (ceph-devel@xxxxxxxxxxxxxxx)
>> Subject: Re: [ceph-calamari] Does anyone understand Calamari??
>> 
>> All that looks fine.
>> 
>> There must be some state where the cluster is known to calamari and it is
>> failing to actually show it.
>> 
>> If you have time to debug I would love to see the logs at debug level.
>> 
>> If you don’t we could try cleaning out calamari’s state.
>> sudo supervisorctl shutdown
>> sudo service httpd stop
>> sudo calamari-ctl cl—yes-i-am-sure
>> sudo calamari-ctl initialize
>> ca
>> then
>> sudo service supervisord start
>> sudo service httpd start
>> 
>> see what the API and UI says then.
>> 
>> regards,
>> Gregory
>>> On May 12, 2015, at 5:18 PM, Bruce McFarland
>> <Bruce.McFarland@xxxxxxxxxxxxxxxx> wrote:
>>> 
>>> Master was ess68 and now it's essperf3.
>>> 
>>> On all cluster nodes the following files now have 'master: essperf3'
>>> /etc/salt/minion
>>> /etc/salt/minion/calamari.conf
>>> /etc/diamond/diamond.conf
>>> 
>>> The 'salt \* ceph.get_heartbeats' is being run on essperf3 - heres a 'salt \*
>> test.ping' from essperf3 Calamari Master to the cluster. I've also included a
>> quick cluster sanity test with the output of ceph -s and ceph osd tree. And for
>> your reading pleasure the output of 'salt octeon109 ceph.get_heartbeats'
>> since I suspect there might be a missing field in the monitor response.
>>> 
>>> oot@essperf3:/etc/ceph# salt \* test.ping
>>> octeon108:
>>>  True
>>> octeon114:
>>>  True
>>> octeon111:
>>>  True
>>> octeon101:
>>>  True
>>> octeon106:
>>>  True
>>> octeon109:
>>>  True
>>> octeon118:
>>>  True
>>> root@essperf3:/etc/ceph# ceph osd tree
>>> # id	weight	type name	up/down	reweight
>>> -1	7	root default
>>> -4	1		host octeon108
>>> 0	1			osd.0	up	1
>>> -2	1		host octeon111
>>> 1	1			osd.1	up	1
>>> -5	1		host octeon115
>>> 2	1			osd.2	DNE
>>> -6	1		host octeon118
>>> 3	1			osd.3	up	1
>>> -7	1		host octeon114
>>> 4	1			osd.4	up	1
>>> -8	1		host octeon106
>>> 5	1			osd.5	up	1
>>> -9	1		host octeon101
>>> 6	1			osd.6	up	1
>>> root@essperf3:/etc/ceph# ceph -s
>>>  cluster 868bfacc-e492-11e4-89fa-000fb711110c
>>>   health HEALTH_OK
>>>   monmap e1: 1 mons at {octeon109=209.243.160.70:6789/0}, election
>> epoch 1, quorum 0 octeon109
>>>   osdmap e80: 6 osds: 6 up, 6 in
>>>    pgmap v26765: 728 pgs, 2 pools, 20070 MB data, 15003 objects
>>>          60604 MB used, 2734 GB / 2793 GB avail
>>>               728 active+clean
>>> root@essperf3:/etc/ceph#
>>> 
>>> root@essperf3:/etc/ceph# salt octeon109 ceph.get_heartbeats
>>> octeon109:
>>>  ----------
>>>  - boot_time:
>>>      1430784431
>>>  - ceph_version:
>>>      0.80.8-0.el6
>>>  - services:
>>>      ----------
>>>      ceph-mon.octeon109:
>>>          ----------
>>>          cluster:
>>>              ceph
>>>          fsid:
>>>              868bfacc-e492-11e4-89fa-000fb711110c
>>>          id:
>>>              octeon109
>>>          status:
>>>              ----------
>>>              election_epoch:
>>>                  1
>>>              extra_probe_peers:
>>>              monmap:
>>>                  ----------
>>>                  created:
>>>                      2015-04-16 23:50:52.412686
>>>                  epoch:
>>>                      1
>>>                  fsid:
>>>                      868bfacc-e492-11e4-89fa-000fb711110c
>>>                  modified:
>>>                      2015-04-16 23:50:52.412686
>>>                  mons:
>>>                      ----------
>>>                      - addr:
>>>                          209.243.160.70:6789/0
>>>                      - name:
>>>                          octeon109
>>>                      - rank:
>>>                          0
>>>              name:
>>>                  octeon109
>>>              outside_quorum:
>>>              quorum:
>>>                  - 0
>>>              rank:
>>>                  0
>>>              state:
>>>                  leader
>>>              sync_provider:
>>>          type:
>>>              mon
>>>          version:
>>>              0.86
>>>  ----------
>>>  - 868bfacc-e492-11e4-89fa-000fb711110c:
>>>      ----------
>>>      fsid:
>>>          868bfacc-e492-11e4-89fa-000fb711110c
>>>      name:
>>>          ceph
>>>      versions:
>>>          ----------
>>>          config:
>>>              87f175c60e5c7ec06c263c556056fbcb
>>>          health:
>>>              a907d0ec395713369b4843381ec31bc2
>>>          mds_map:
>>>              1
>>>          mon_map:
>>>              1
>>>          mon_status:
>>>              1
>>>          osd_map:
>>>              80
>>>          pg_summary:
>>>              7e29d7cc93cfced8f3f146cc78f5682f
>>> root@essperf3:/etc/ceph#
>>> 
>>> 
>>> 
>>>> -----Original Message-----
>>>> From: Gregory Meno [mailto:gmeno@xxxxxxxxxx]
>>>> Sent: Tuesday, May 12, 2015 5:03 PM
>>>> To: Bruce McFarland
>>>> Cc: ceph-calamari@xxxxxxxxxxxxxx; ceph-users@xxxxxxxx; ceph-devel
>>>> (ceph-devel@xxxxxxxxxxxxxxx)
>>>> Subject: Re: [ceph-calamari] Does anyone understand Calamari??
>>>> 
>>>> Bruce,
>>>> 
>>>> It is great to hear that salt is reporting status from all the nodes
>>>> in the cluster.
>>>> 
>>>> Let me see if I understand your question:
>>>> 
>>>> You want to know what conditions cause us to recognize a working
>> cluster?
>>>> 
>>>> see
>>>> 
>> https://github.com/ceph/calamari/blob/master/cthulhu/cthulhu/manager/
>>>> manager.py#L135
>>>> 
>>>> 
>> https://github.com/ceph/calamari/blob/master/cthulhu/cthulhu/manager/
>>>> manager.py#L349
>>>> 
>>>> and
>>>> 
>>>> 
>> https://github.com/ceph/calamari/blob/master/cthulhu/cthulhu/manager/
>>>> c
>>>> luster_monitor.py
>>>> 
>>>> 
>>>> Let’s check that you need to be digging into that level of detail:
>>>> 
>>>> You switched to a new instance of calamari and it is not recognizing
>>>> the cluster.
>>>> 
>>>> You what to know what you are overlooking? Would you please clarify
>>>> with some hostnames?
>>>> 
>>>> i.e. Let say that your old calamari node was called calamariA and
>>>> that your new node is calamariB
>>>> 
>>>> from which are you running the get_heartbeats?
>>>> 
>>>> what is the master setting in the minion config files out on the
>>>> nodes of the cluster if things are setup correctly they would look like this:
>>>> 
>>>> [root@node1 shadow_man]# cat /etc/salt/minion.d/calamari.conf
>>>> master: calamariB
>>>> 
>>>> 
>>>> If this is the case the thing I would check is the
>>>> http://calamariB/api/v2/cluster endpoint is reporting anything?
>>>> 
>>>> hope this helps,
>>>> Gregory
>>>> 
>>>>> On May 12, 2015, at 4:34 PM, Bruce McFarland
>>>> <Bruce.McFarland@xxxxxxxxxxxxxxxx> wrote:
>>>>> 
>>>>> Increasing the audience since ceph-calamari is not responsive. What
>>>>> salt
>>>> event/info does the Calamari Master expect to see from the ceph-mon
>>>> to determine there is an working cluster? I had to change servers
>>>> hosting the calamari master and can’t get the new machine to recognize
>> the cluster.
>>>> The ‘salt \* ceph.get_heartbeats’ returns monmap, fsid, ver, epoch,
>>>> etc for the monitor and all of the osd’s. Can anyone point me to docs
>>>> or code that might enlighten me to what I’m overlooking? Thanks.
>>>>> _______________________________________________
>>>>> ceph-calamari mailing list
>>>>> ceph-calamari@xxxxxxxxxxxxxx
>>>>> http://lists.ceph.com/listinfo.cgi/ceph-calamari-ceph.com
>>> 
> 

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com