ceph-jewel on docker+Kubernetes - crashing

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



hello All,
I am trying Ceph - Jewel on Ubuntu 16.04 with Kubernetes 1.6.2 and Docker  1.11.2
but for some unknown reason its not coming up and crashing often,all ceph commands are failing.
from ceph-mon-check:

kubectl logs -n ceph ceph-mon-check-3190136794-21xg4 -f

subprocess.CalledProcessError: Command 'ceph --cluster=${CLUSTER} mon getmap > /tmp/monmap && monmaptool -f /tmp/monmap --print' returned non-zero exit status 1
2017-05-01 15:45:52  /entrypoint.sh: sleep 30 sec
2017-05-01 15:46:22  /entrypoint.sh: checking for zombie mons
2017-05-01 15:51:22.613476 7f0d3ea8c700  0 monclient(hunting): authenticate timed out after 300
2017-05-01 15:51:22.613561 7f0d3ea8c700  0 librados: client.admin authentication error (110) Connection timed out
Error connecting to cluster: TimedOut
Traceback (most recent call last):
  File "/check_zombie_mons.py", line 30, in <module>
    current_mons = extract_mons_from_monmap()
  File "/check_zombie_mons.py", line 18, in extract_mons_from_monmap
    monmap = subprocess.check_output(monmap_command, shell=True)
  File "/usr/lib/python2.7/subprocess.py", line 574, in check_output
    raise CalledProcessError(retcode, cmd, output=output)


all pods and nodes are able to resolve service-name "ceph-mon"

cep keys are present in all pods.

kubectl exec -n ceph ceph-mon-0 -- ls /etc/ceph/
ceph.client.admin.keyring
ceph.conf
ceph.mon.keyring


kubectl logs -n ceph ceph-mon-0 --tail=20

2017-05-01 16:08:44.081462 7fcdf369a700  1 mon.ceph-mon-0@-1(probing) e0  adding peer 192.168.14.239:6789/0 to list of hints
2017-05-01 16:08:45.158398 7fcdf1595700  0 -- 192.168.110.236:6789/0 >> :/0 pipe(0x562d60fb0000 sd=21 :6789 s=0 pgs=0 cs=0 l=0 c=0x562d603f1980).accept failed to getpeername (107) Transport endpoint is not connected
2017-05-01 16:08:45.158328 7fcdf0f8f700  0 -- 192.168.110.236:6789/0 >> :/0 pipe(0x562d6026b400 sd=19 :6789 s=0 pgs=0 cs=0 l=0 c=0x562d602eac00).accept failed to getpeername (107) Transport endpoint is not connected
2017-05-01 16:08:45.745314 7fcdf369a700  1 mon.ceph-mon-0@-1(probing) e0  adding peer 192.168.198.94:6789/0 to list of hints
2017-05-01 16:08:46.081824 7fcdf369a700  1 mon.ceph-mon-0@-1(probing) e0  adding peer 192.168.14.239:6789/0 to list of hints
2017-05-01 16:08:47.745473 7fcdf369a700  1 mon.ceph-mon-0@-1(probing) e0  adding peer 192.168.198.94:6789/0 to list of hints
2017-05-01 16:08:48.081962 7fcdf369a700  1 mon.ceph-mon-0@-1(probing) e0  adding peer 192.168.14.239:6789/0 to list of hints
2017-05-01 16:08:49.745526 7fcdf369a700  1 mon.ceph-mon-0@-1(probing) e0  adding peer 192.168.198.94:6789/0 to list of hints
2017-05-01 16:08:50.081979 7fcdf369a700  1 mon.ceph-mon-0@-1(probing) e0  adding peer 192.168.14.239:6789/0 to list of hints
2017-05-01 16:08:51.746027 7fcdf369a700  1 mon.ceph-mon-0@-1(probing) e0  adding peer 192.168.198.94:6789/0 to list of hints
2017-05-01 16:08:52.082151 7fcdf369a700  1 mon.ceph-mon-0@-1(probing) e0  adding peer 192.168.14.239:6789/0 to list of hints
2017-05-01 16:08:53.745586 7fcdf369a700  1 mon.ceph-mon-0@-1(probing) e0  adding peer 192.168.198.94:6789/0 to list of hints
2017-05-01 16:08:54.082630 7fcdf369a700  1 mon.ceph-mon-0@-1(probing) e0  adding peer 192.168.14.239:6789/0 to list of hints
2017-05-01 16:08:55.158549 7fcdf0b8b700  0 -- 192.168.110.236:6789/0 >> :/0 pipe(0x562d6026b400 sd=19 :6789 s=0 pgs=0 cs=0 l=0 c=0x562d608ff900).accept failed to getpeername (107) Transport endpoint is not connected
2017-05-01 16:08:55.158621 7fcdf1191700  0 -- 192.168.110.236:6789/0 >> :/0 pipe(0x562d60fb0000 sd=21 :6789 s=0 pgs=0 cs=0 l=0 c=0x562d608fd500).accept failed to getpeername (107) Transport endpoint is not connected
2017-05-01 16:08:55.745867 7fcdf369a700  1 mon.ceph-mon-0@-1(probing) e0  adding peer 192.168.198.94:6789/0 to list of hints
2017-05-01 16:08:56.082868 7fcdf369a700  1 mon.ceph-mon-0@-1(probing) e0  adding peer 192.168.14.239:6789/0 to list of hints
2017-05-01 16:08:57.686779 7fcdf3e9b700  0 mon.ceph-mon-0@-1(probing).data_health(0) update_stats avail 93% total 237 GB, used 4398 MB, avail 221 GB
2017-05-01 16:08:57.746175 7fcdf369a700  1 mon.ceph-mon-0@-1(probing) e0  adding peer 192.168.198.94:6789/0 to list of hints
2017-05-01 16:08:58.083616 7fcdf369a700  1 mon.ceph-mon-0@-1(probing) e0  adding peer 192.168.14.239:6789/0 to list of hints


kubectl get po -n ceph
NAME                              READY     STATUS             RESTARTS   AGE
ceph-mds-722237312-35l5k          0/1       CrashLoopBackOff   324        1d
ceph-mon-0                        1/1          Running            0          1d
ceph-mon-1                        1/1          Running            0          1d
ceph-mon-2                        1/1          Running            0          1d
ceph-mon-check-3190136794-21xg4   1/1       Running            0          1d
ceph-osd-bvz3h                    0/1        CrashLoopBackOff   409        1d
ceph-osd-hq50d                    0/1        Running            408        1d
ceph-osd-ljdwh                    0/1         CrashLoopBackOff   409        1d


kubectl logs -n ceph ceph-osd-ljdwh --tail=20
2017-05-01 16:33:57  /entrypoint.sh: k8s: config is stored as k8s secrets.
2017-05-01 16:33:57  /entrypoint.sh: k8s: does not generate the admin key. Use Kubernetes secrets instead.
2017-05-01 16:33:57  /entrypoint.sh: Creating osd with ceph --cluster ceph osd create


ceph.conf

kubectl exec -n ceph ceph-osd-ljdwh  -- cat /etc/ceph/ceph.conf |more
[global]
fsid = 34fc2470-a9f2-49df-8d2f-701e2679c8c5
cephx = true
cephx_require_signatures = false
cephx_cluster_require_signatures = true
cephx_service_require_signatures = false

# auth
max_open_files = 131072
osd_pool_default_pg_num = 128
osd_pool_default_pgp_num = 128
osd_pool_default_size = 3
osd_pool_default_min_size = 1

mon_osd_full_ratio = .95
mon_osd_nearfull_ratio = .85

mon_host = ceph-mon

[mon]
mon_osd_down_out_interval = 600
mon_osd_min_down_reporters = 4
mon_clock_drift_allowed = .15
mon_clock_drift_warn_backoff = 30

Any idea why its failing with authentication error.

Regards,
Kev

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux