Unable to deploy new manager in octopus

Patrick Vranckx <patrick.vranckx@xxxxxxxxxxxx> · Thu, 2 Jun 2022 17:00:05 +0200

Hi,

On my test cluster, I migrated from Nautilus to Octopus and the 
converted most of the daemons to cephadm. I got a lot of problem with 
podman 1.6.4 on CentOS 7 through an https proxy because my servers are 
on a private network.

Now, I'm unable to deploy new managers and the cluster is in a bizarre 
situation:

[root@cepht003 f5a025f9-fbe8-4506-8769-453902eb28d6]# ceph -s
  cluster:
    id:     f5a025f9-fbe8-4506-8769-453902eb28d6
    health: HEALTH_WARN
            client is using insecure global_id reclaim
            mons are allowing insecure global_id reclaim
            failed to probe daemons or devices
            42 stray daemon(s) not managed by cephadm
            2 stray host(s) with 39 daemon(s) not managed by cephadm
            1 daemons have recently crashed

  services:
    mon: 5 daemons, quorum 
cepht003,cepht002,cepht001,cepht004,cephtstor01 (age 19m)
    mgr: cepht004.wyibzh(active, since 29m), standbys: cepht003.aaaaaa
    mds: fsdup:1 fsec:1 
{fsdup:0=fsdup.cepht001.opiyzk=up:active,fsec:0=fsec.cepht003.giatub=up:active} 
7 up:standby
    osd: 40 osds: 40 up (since 92m), 40 in (since 3d)
    rgw: 2 daemons active (cepht001, cepht004)

  task status:

  data:
    pools:   18 pools, 577 pgs
    objects: 6.32k objects, 2[root@cepht003 
f5a025f9-fbe8-4506-8769-453902eb28d6]# ceph orch ps
NAME                                      HOST STATUS         REFRESHED  
AGE  VERSION    IMAGE NAME                 IMAGE ID      CONTAINER ID

mds.fdec.cepht004.vbuphb                  cepht004     running (62m)  
47s ago    4h   15.2.13    docker.io/ceph/ceph:v15 2cf504fded39  
5fad10ffc981
mds.fdec.cephtstor01.gtxsnr               cephtstor01  running (24m)  
46s ago    24m  15.2.13    docker.io/ceph/ceph:v15 2cf504fded39  
24e837f6ac8a
mds.fdup.cepht001.nydfzs                  cepht001     running (2h)   
47s ago    2h   15.2.13    docker.io/ceph/ceph:v15 2cf504fded39  
b1880e343ece
mds.fdup.cepht003.thsnbk                  cepht003     running (34m)  
45s ago    34m  15.2.13    docker.io/ceph/ceph:v15 2cf504fded39  
ddd4e395e7b3
mds.fsdup.cepht001.opiyzk                 cepht001     running (4h)   
47s ago    4h   15.2.13    docker.io/ceph/ceph:v15 2cf504fded39  
ad081f718863
mds.fsdup.cepht004.cfnxxw                 cepht004     running (62m)  
47s ago    20h  15.2.13    docker.io/ceph/ceph:v15 2cf504fded39  
c6feed82af8f
mds.fsec.cepht002.uebrlc                  cepht002     running (20m)  
47s ago    20m  15.2.13    docker.io/ceph/ceph:v15 2cf504fded39  
836f448c5708
mds.fsec.cepht003.giatub                  cepht003     running (76m)  
45s ago    5h   15.2.13    docker.io/ceph/ceph:v15 2cf504fded39  
f235957145cb
mgr.cepht003.aaaaaa                       cepht003 stopped        45s 
ago    20h  15.2.6 quay.io/ceph/ceph:v15.2.6  f16a759354cc  770d7cf078ad
mgr.cepht004.wyibzh                       cepht004 unknown        47s 
ago    20h  15.2.13 docker.io/ceph/ceph:v15    2cf504fded39  6baa0f625271
mon.cepht001                              cepht001     running (4h)   
47s ago    4h   15.2.13    docker.io/ceph/ceph:v15 2cf504fded39  
e7f24769153c
mon.cepht002                              cepht002     running (20m)  
47s ago    20m  15.2.13    docker.io/ceph/ceph:v15 2cf504fded39  
dbb5be113201
mon.cepht003                              cepht003     running (76m)  
45s ago    5h   15.2.13    docker.io/ceph/ceph:v15 2cf504fded39  
6c2d6707b3fe
mon.cepht004                              cepht004     running (62m)  
47s ago    4h   15.2.13    docker.io/ceph/ceph:v15 2cf504fded39  
7986b598fd17
mon.cephtstor01                           cephtstor01  running (93m)  
46s ago    2h   15.2.13    docker.io/ceph/ceph:v15 2cf504fded39  
dbd9255aab10
osd.10                                    cephtstor01  running (93m)  
46s ago    2h   15.2.16    quay.io/ceph/ceph:v15 8d5775c85c6a  
01b07c4a75f7  4 GiB
    usage:   80 GiB used, 102 TiB / 102 TiB avail
    pgs:     577 active+clean

When I try to create a new mgr, I get :

[ceph: root@cepht002 /]# ceph orch daemon add mgr cepht002
Error EINVAL: cephadm exited with an error code: 1, stderr:Deploy daemon 
mgr.cepht002.kqhnbt ...
Verifying port 8443 ...
ERROR: TCP Port(s) '8443' required for mgr already in use

But nothing runs on that port:

[root@cepht002 f5a025f9-fbe8-4506-8769-453902eb28d6]# ss -lntu
Netid  State      Recv-Q Send-Q Local Address:Port Peer Address:Port
udp    UNCONN     0      0 127.0.0.1:323 *:*
tcp    LISTEN     0      128 192.168.64.152:6789 *:*
tcp    LISTEN     0      128 192.168.64.152:6800 *:*
tcp    LISTEN     0      128 192.168.64.152:6801 *:*
tcp    LISTEN     0 128 *:22 *:*
tcp    LISTEN     0      100 127.0.0.1:25 *:*
tcp    LISTEN     0      128 127.0.0.1:6010 *:*
tcp    LISTEN     0 128 *:10050 *:*
tcp    LISTEN     0      128 192.168.64.152:3300 *:*

I get the same error with the command "ceph orch apply mgr ...". The 
same for each node in the cluster.

I find no answer on Google...

Any idea ?

Patrick

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx