Unable to deploy new manager in octopus

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

On my test cluster, I migrated from Nautilus to Octopus and the converted most of the daemons to cephadm. I got a lot of problem with podman 1.6.4 on CentOS 7 through an https proxy because my servers are on a private network.

Now, I'm unable to deploy new managers and the cluster is in a bizarre situation:

[root@cepht003 f5a025f9-fbe8-4506-8769-453902eb28d6]# ceph -s
  cluster:
    id:     f5a025f9-fbe8-4506-8769-453902eb28d6
    health: HEALTH_WARN
            client is using insecure global_id reclaim
            mons are allowing insecure global_id reclaim
            failed to probe daemons or devices
            42 stray daemon(s) not managed by cephadm
            2 stray host(s) with 39 daemon(s) not managed by cephadm
            1 daemons have recently crashed

  services:
    mon: 5 daemons, quorum cepht003,cepht002,cepht001,cepht004,cephtstor01 (age 19m)
    mgr: cepht004.wyibzh(active, since 29m), standbys: cepht003.aaaaaa
    mds: fsdup:1 fsec:1 {fsdup:0=fsdup.cepht001.opiyzk=up:active,fsec:0=fsec.cepht003.giatub=up:active} 7 up:standby
    osd: 40 osds: 40 up (since 92m), 40 in (since 3d)
    rgw: 2 daemons active (cepht001, cepht004)

  task status:

  data:
    pools:   18 pools, 577 pgs
    objects: 6.32k objects, 2[root@cepht003 f5a025f9-fbe8-4506-8769-453902eb28d6]# ceph orch ps NAME                                      HOST STATUS         REFRESHED  AGE  VERSION    IMAGE NAME                 IMAGE ID      CONTAINER ID

mds.fdec.cepht004.vbuphb                  cepht004     running (62m)  47s ago    4h   15.2.13    docker.io/ceph/ceph:v15 2cf504fded39  5fad10ffc981 mds.fdec.cephtstor01.gtxsnr               cephtstor01  running (24m)  46s ago    24m  15.2.13    docker.io/ceph/ceph:v15 2cf504fded39  24e837f6ac8a mds.fdup.cepht001.nydfzs                  cepht001     running (2h)   47s ago    2h   15.2.13    docker.io/ceph/ceph:v15 2cf504fded39  b1880e343ece mds.fdup.cepht003.thsnbk                  cepht003     running (34m)  45s ago    34m  15.2.13    docker.io/ceph/ceph:v15 2cf504fded39  ddd4e395e7b3 mds.fsdup.cepht001.opiyzk                 cepht001     running (4h)   47s ago    4h   15.2.13    docker.io/ceph/ceph:v15 2cf504fded39  ad081f718863 mds.fsdup.cepht004.cfnxxw                 cepht004     running (62m)  47s ago    20h  15.2.13    docker.io/ceph/ceph:v15 2cf504fded39  c6feed82af8f mds.fsec.cepht002.uebrlc                  cepht002     running (20m)  47s ago    20m  15.2.13    docker.io/ceph/ceph:v15 2cf504fded39  836f448c5708 mds.fsec.cepht003.giatub                  cepht003     running (76m)  45s ago    5h   15.2.13    docker.io/ceph/ceph:v15 2cf504fded39  f235957145cb mgr.cepht003.aaaaaa                       cepht003 stopped        45s ago    20h  15.2.6 quay.io/ceph/ceph:v15.2.6  f16a759354cc  770d7cf078ad mgr.cepht004.wyibzh                       cepht004 unknown        47s ago    20h  15.2.13 docker.io/ceph/ceph:v15    2cf504fded39  6baa0f625271 mon.cepht001                              cepht001     running (4h)   47s ago    4h   15.2.13    docker.io/ceph/ceph:v15 2cf504fded39  e7f24769153c mon.cepht002                              cepht002     running (20m)  47s ago    20m  15.2.13    docker.io/ceph/ceph:v15 2cf504fded39  dbb5be113201 mon.cepht003                              cepht003     running (76m)  45s ago    5h   15.2.13    docker.io/ceph/ceph:v15 2cf504fded39  6c2d6707b3fe mon.cepht004                              cepht004     running (62m)  47s ago    4h   15.2.13    docker.io/ceph/ceph:v15 2cf504fded39  7986b598fd17 mon.cephtstor01                           cephtstor01  running (93m)  46s ago    2h   15.2.13    docker.io/ceph/ceph:v15 2cf504fded39  dbd9255aab10 osd.10                                    cephtstor01  running (93m)  46s ago    2h   15.2.16    quay.io/ceph/ceph:v15 8d5775c85c6a  01b07c4a75f7  4 GiB
    usage:   80 GiB used, 102 TiB / 102 TiB avail
    pgs:     577 active+clean


When I try to create a new mgr, I get :

[ceph: root@cepht002 /]# ceph orch daemon add mgr cepht002
Error EINVAL: cephadm exited with an error code: 1, stderr:Deploy daemon mgr.cepht002.kqhnbt ...
Verifying port 8443 ...
ERROR: TCP Port(s) '8443' required for mgr already in use

But nothing runs on that port:

[root@cepht002 f5a025f9-fbe8-4506-8769-453902eb28d6]# ss -lntu
Netid  State      Recv-Q Send-Q Local Address:Port Peer Address:Port
udp    UNCONN     0      0 127.0.0.1:323 *:*
tcp    LISTEN     0      128 192.168.64.152:6789 *:*
tcp    LISTEN     0      128 192.168.64.152:6800 *:*
tcp    LISTEN     0      128 192.168.64.152:6801 *:*
tcp    LISTEN     0 128 *:22 *:*
tcp    LISTEN     0      100 127.0.0.1:25 *:*
tcp    LISTEN     0      128 127.0.0.1:6010 *:*
tcp    LISTEN     0 128 *:10050 *:*
tcp    LISTEN     0      128 192.168.64.152:3300 *:*

I get the same error with the command "ceph orch apply mgr ...". The same for each node in the cluster.

I find no answer on Google...

Any idea ?

Patrick

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux