Calamari doesn't detect a running cluster despite of connected ceph servers

<Pieroth.N@xxxxxx> · Thu, 7 Jul 2016 11:11:39 +0000

Hello,
i know there are lots of users with the same problem and if this is the wrong
mailinglist, please tell me. I've tried to fix this but this calamari is
driving me nuts.

Problem: Running Ceph Cluster with healty state.
calamari gui once tried to connect the ceph-nodes, but after the 120 Sec.
timeout there comes the message no cluster created yet. After changing
software versions and configs for long hours I'm close to give up.

Actual State:
ceph seems ok (rudimentary installation with 3 nodes). The nodes are running
osd's and mon's. One Admin node with calamari and salt-master.

    cluster 10c29f99-caf8-4057-8cc7-1f94359418f2
     health HEALTH_OK
     monmap e1: 3 mons at {wmaiz-feink06=172.23.65.26:6789/0,wmaiz-feink07=172.23.65.27:6789/0,wmaiz-feink08=172.23.65.28:6789/0}
            election epoch 34, quorum 0,1,2 wmaiz-feink06,wmaiz-feink07,wmaiz-feink08
     osdmap e66: 3 osds: 3 up, 3 in
            flags sortbitwise
      pgmap v265: 64 pgs, 1 pools, 0 bytes data, 0 objects
            102 MB used, 698 GB / 698 GB avail
                  64 active+clean
Software:
Ubuntu 14.04 with standard Kernel 3.13
ceph: 10.2.2-1trusty
salt: 2014.1.13+ds-1trusty1 (minions and masters with the same version)
diamond: 3.4.67
libgraphite: 1.3.6-1ubuntu0.14.04.1
calamari: 1.3.1.1-1trusty

I tried the also with the actual Ubuntu Versions of salt, but in the ceph-docs
is mentioned that salt 2014.7. I also tried 2014.7 but this is also not working.
Now I'm running a version below that.

Salt-keys are accepted:
Accepted Keys:
wmaiz-feink05.dbc.zdf.de
wmaiz-feink06.dbc.zdf.de
wmaiz-feink07.dbc.zdf.de
wmaiz-feink08.dbc.zdf.de
Unaccepted Keys:
Rejected Keys:

There are keys from the minion on the calamarihost, but this should not be the problem
I think. I could deinstall or deactivate the minion and the errors are the same.

root@wmaiz-feink05:/home/deploy# salt \* test.ping
wmaiz-feink07.dbc.zdf.de:
    True
wmaiz-feink08.dbc.zdf.de:
    True
wmaiz-feink06.dbc.zdf.de:
    True
wmaiz-feink05.dbc.zdf.de:
    True

salt \* ceph.get_hearteats shows something like this:

        cluster_heartbeat[fsid] = cluster_status(cluster_handle, fsid_names[fsid])
      File "/var/cache/salt/minion/extmods/modules/ceph.py", line 566, in cluster_status
        mds_epoch = status['mdsmap']['epoch']
    KeyError: 'mdsmap'
wmaiz-feink08.dbc.zdf.de:
    The minion function caused an exception: Traceback (most recent call last):
      File "/usr/lib/python2.7/dist-packages/salt/minion.py", line 809, in _thread_return
        return_data = func(*args, **kwargs)
      File "/var/cache/salt/minion/extmods/modules/ceph.py", line 498, in get_heartbeats
        cluster_heartbeat[fsid] = cluster_status(cluster_handle, fsid_names[fsid])
      File "/var/cache/salt/minion/extmods/modules/ceph.py", line 566, in cluster_status
        mds_epoch = status['mdsmap']['epoch']
    KeyError: 'mdsmap'
wmaiz-feink07.dbc.zdf.de:
    The minion function caused an exception: Traceback (most recent call last):
      File "/usr/lib/python2.7/dist-packages/salt/minion.py", line 809, in _thread_return
        return_data = func(*args, **kwargs)
      File "/var/cache/salt/minion/extmods/modules/ceph.py", line 498, in get_heartbeats
        cluster_heartbeat[fsid] = cluster_status(cluster_handle, fsid_names[fsid])
      File "/var/cache/salt/minion/extmods/modules/ceph.py", line 566, in cluster_status
        mds_epoch = status['mdsmap']['epoch']
    KeyError: 'mdsmap'

which results from an exception of the salt-minions on the ceph nodes.
Depending on the salt-version I'm running it's nearly always the problem that
the Minions are throwing exceptions or other pyhton errors are popping up (unfortunately I
don't have the Error-Dumps cause I reinstalled the scenario so often).
I've set the master : <calalmarihost> in /etc/salt/minion.d/calamari.conf.
I've set the ceph-deploy-calamari master also to the calamarihost.
The REST Api shows only the calamarihost on running wmaiz-feink05.dbc.zdf.de/v2/api/server section.
When I try to get Infos including the cluster fsid i get errors (404) with the hint that
the cluster with this FSID is not found.

Config:

ceph.conf:
[global]
fsid = 10c29f99-caf8-4057-8cc7-1f94359418f2
mon_initial_members = wmaiz-feink06, wmaiz-feink07, wmaiz-feink08
mon_host = 172.23.65.26,172.23.65.27,172.23.65.28
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
osd_pool_defautl_size = 2
public_network = 172.23.65.0/24

Could it be s.th. with the Authentification ? Perhaps deactivating cephx ?
I don't know where to look right now. Help is appreciated...

kind regards
pir
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com