Unable to deploy new manager in octopus

Josef Johansson <josef@xxxxxxxxxxx> · Thu, 2 Jun 2022 22:42:17 +0200 (CEST)

Hi,

Have you checked all of the namespaces?

lsns -t net
nsenter -t <pid> -n ss -nlp

Cheers,
Josef

> På 2022-06-02 17:00 skrev Patrick Vranckx <patrick.vranckx@xxxxxxxxxxxx>:
> 
>  
> Hi,
> 
> On my test cluster, I migrated from Nautilus to Octopus and the 
> converted most of the daemons to cephadm. I got a lot of problem with 
> podman 1.6.4 on CentOS 7 through an https proxy because my servers are 
> on a private network.
> 
> Now, I'm unable to deploy new managers and the cluster is in a bizarre 
> situation:
> 
> [root@cepht003 f5a025f9-fbe8-4506-8769-453902eb28d6]# ceph -s
>    cluster:
>      id:     f5a025f9-fbe8-4506-8769-453902eb28d6
>      health: HEALTH_WARN
>              client is using insecure global_id reclaim
>              mons are allowing insecure global_id reclaim
>              failed to probe daemons or devices
>              42 stray daemon(s) not managed by cephadm
>              2 stray host(s) with 39 daemon(s) not managed by cephadm
>              1 daemons have recently crashed
> 
>    services:
>      mon: 5 daemons, quorum 
> cepht003,cepht002,cepht001,cepht004,cephtstor01 (age 19m)
>      mgr: cepht004.wyibzh(active, since 29m), standbys: cepht003.aaaaaa
>      mds: fsdup:1 fsec:1 
> {fsdup:0=fsdup.cepht001.opiyzk=up:active,fsec:0=fsec.cepht003.giatub=up:active} 
> 7 up:standby
>      osd: 40 osds: 40 up (since 92m), 40 in (since 3d)
>      rgw: 2 daemons active (cepht001, cepht004)
> 
>    task status:
> 
>    data:
>      pools:   18 pools, 577 pgs
>      objects: 6.32k objects, 2[root@cepht003 
> f5a025f9-fbe8-4506-8769-453902eb28d6]# ceph orch ps
> NAME                                      HOST STATUS         REFRESHED  
> AGE  VERSION    IMAGE NAME                 IMAGE ID      CONTAINER ID
> 
> mds.fdec.cepht004.vbuphb                  cepht004     running (62m)  
> 47s ago    4h   15.2.13    docker.io/ceph/ceph:v15 2cf504fded39  
> 5fad10ffc981
> mds.fdec.cephtstor01.gtxsnr               cephtstor01  running (24m)  
> 46s ago    24m  15.2.13    docker.io/ceph/ceph:v15 2cf504fded39  
> 24e837f6ac8a
> mds.fdup.cepht001.nydfzs                  cepht001     running (2h)   
> 47s ago    2h   15.2.13    docker.io/ceph/ceph:v15 2cf504fded39  
> b1880e343ece
> mds.fdup.cepht003.thsnbk                  cepht003     running (34m)  
> 45s ago    34m  15.2.13    docker.io/ceph/ceph:v15 2cf504fded39  
> ddd4e395e7b3
> mds.fsdup.cepht001.opiyzk                 cepht001     running (4h)   
> 47s ago    4h   15.2.13    docker.io/ceph/ceph:v15 2cf504fded39  
> ad081f718863
> mds.fsdup.cepht004.cfnxxw                 cepht004     running (62m)  
> 47s ago    20h  15.2.13    docker.io/ceph/ceph:v15 2cf504fded39  
> c6feed82af8f
> mds.fsec.cepht002.uebrlc                  cepht002     running (20m)  
> 47s ago    20m  15.2.13    docker.io/ceph/ceph:v15 2cf504fded39  
> 836f448c5708
> mds.fsec.cepht003.giatub                  cepht003     running (76m)  
> 45s ago    5h   15.2.13    docker.io/ceph/ceph:v15 2cf504fded39  
> f235957145cb
> mgr.cepht003.aaaaaa                       cepht003 stopped        45s 
> ago    20h  15.2.6 quay.io/ceph/ceph:v15.2.6  f16a759354cc  770d7cf078ad
> mgr.cepht004.wyibzh                       cepht004 unknown        47s 
> ago    20h  15.2.13 docker.io/ceph/ceph:v15    2cf504fded39  6baa0f625271
> mon.cepht001                              cepht001     running (4h)   
> 47s ago    4h   15.2.13    docker.io/ceph/ceph:v15 2cf504fded39  
> e7f24769153c
> mon.cepht002                              cepht002     running (20m)  
> 47s ago    20m  15.2.13    docker.io/ceph/ceph:v15 2cf504fded39  
> dbb5be113201
> mon.cepht003                              cepht003     running (76m)  
> 45s ago    5h   15.2.13    docker.io/ceph/ceph:v15 2cf504fded39  
> 6c2d6707b3fe
> mon.cepht004                              cepht004     running (62m)  
> 47s ago    4h   15.2.13    docker.io/ceph/ceph:v15 2cf504fded39  
> 7986b598fd17
> mon.cephtstor01                           cephtstor01  running (93m)  
> 46s ago    2h   15.2.13    docker.io/ceph/ceph:v15 2cf504fded39  
> dbd9255aab10
> osd.10                                    cephtstor01  running (93m)  
> 46s ago    2h   15.2.16    quay.io/ceph/ceph:v15 8d5775c85c6a  
> 01b07c4a75f7  4 GiB
>      usage:   80 GiB used, 102 TiB / 102 TiB avail
>      pgs:     577 active+clean
> 
> 
> When I try to create a new mgr, I get :
> 
> [ceph: root@cepht002 /]# ceph orch daemon add mgr cepht002
> Error EINVAL: cephadm exited with an error code: 1, stderr:Deploy daemon 
> mgr.cepht002.kqhnbt ...
> Verifying port 8443 ...
> ERROR: TCP Port(s) '8443' required for mgr already in use
> 
> But nothing runs on that port:
> 
> [root@cepht002 f5a025f9-fbe8-4506-8769-453902eb28d6]# ss -lntu
> Netid  State      Recv-Q Send-Q Local Address:Port Peer Address:Port
> udp    UNCONN     0      0 127.0.0.1:323 *:*
> tcp    LISTEN     0      128 192.168.64.152:6789 *:*
> tcp    LISTEN     0      128 192.168.64.152:6800 *:*
> tcp    LISTEN     0      128 192.168.64.152:6801 *:*
> tcp    LISTEN     0 128 *:22 *:*
> tcp    LISTEN     0      100 127.0.0.1:25 *:*
> tcp    LISTEN     0      128 127.0.0.1:6010 *:*
> tcp    LISTEN     0 128 *:10050 *:*
> tcp    LISTEN     0      128 192.168.64.152:3300 *:*
> 
> I get the same error with the command "ceph orch apply mgr ...". The 
> same for each node in the cluster.
> 
> I find no answer on Google...
> 
> Any idea ?
> 
> Patrick
> 
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx