[cephadm] Found duplicate OSDs

Satish Patel <satish.txt@xxxxxxxxx> · Thu, 1 Sep 2022 12:14:20 -0400

Folks,

I am playing with cephadm and life was good until I started upgrading from
octopus to pacific. My upgrade process stuck after upgrading mgr and in
logs now i can see following error

root@ceph1:~# ceph log last cephadm
2022-09-01T14:40:45.739804+0000 mgr.ceph2.hmbdla (mgr.265806) 8 :
cephadm [INF] Deploying daemon grafana.ceph1 on ceph1
2022-09-01T14:40:56.115693+0000 mgr.ceph2.hmbdla (mgr.265806) 14 :
cephadm [INF] Deploying daemon prometheus.ceph1 on ceph1
2022-09-01T14:41:11.856725+0000 mgr.ceph2.hmbdla (mgr.265806) 25 :
cephadm [INF] Reconfiguring alertmanager.ceph1 (dependencies
changed)...
2022-09-01T14:41:11.861535+0000 mgr.ceph2.hmbdla (mgr.265806) 26 :
cephadm [INF] Reconfiguring daemon alertmanager.ceph1 on ceph1
2022-09-01T14:41:12.927852+0000 mgr.ceph2.hmbdla (mgr.265806) 27 :
cephadm [INF] Reconfiguring grafana.ceph1 (dependencies changed)...
2022-09-01T14:41:12.940615+0000 mgr.ceph2.hmbdla (mgr.265806) 28 :
cephadm [INF] Reconfiguring daemon grafana.ceph1 on ceph1
2022-09-01T14:41:14.056113+0000 mgr.ceph2.hmbdla (mgr.265806) 33 :
cephadm [INF] Found duplicate OSDs: osd.2 in status running on ceph1,
osd.2 in status running on ceph2
2022-09-01T14:41:14.056437+0000 mgr.ceph2.hmbdla (mgr.265806) 34 :
cephadm [INF] Found duplicate OSDs: osd.5 in status running on ceph1,
osd.5 in status running on ceph2
2022-09-01T14:41:14.056630+0000 mgr.ceph2.hmbdla (mgr.265806) 35 :
cephadm [INF] Found duplicate OSDs: osd.3 in status running on ceph1,
osd.3 in status running on ceph2

Not sure from where duplicate names came and how that happened. In
following output i can't see any duplication

root@ceph1:~# ceph osd tree
ID  CLASS  WEIGHT   TYPE NAME       STATUS  REWEIGHT  PRI-AFF
-1         0.97656  root default
-3         0.48828      host ceph1
 4    hdd  0.09769          osd.4       up   1.00000  1.00000
 0    ssd  0.19530          osd.0       up   1.00000  1.00000
 1    ssd  0.19530          osd.1       up   1.00000  1.00000
-5         0.48828      host ceph2
 5    hdd  0.09769          osd.5       up   1.00000  1.00000
 2    ssd  0.19530          osd.2       up   1.00000  1.00000
 3    ssd  0.19530          osd.3       up   1.00000  1.00000

But same time i can see duplicate OSD number in ceph1 and ceph2

root@ceph1:~# ceph orch ps
NAME                 HOST   PORTS        STATUS         REFRESHED  AGE
 MEM USE  MEM LIM  VERSION  IMAGE ID      CONTAINER ID
alertmanager.ceph1   ceph1  *:9093,9094  running (20s)     2s ago  20s
   17.1M        -           ba2b418f427c  856a4fe641f1
alertmanager.ceph1   ceph2  *:9093,9094  running (20s)     3s ago  20s
   17.1M        -           ba2b418f427c  856a4fe641f1
crash.ceph2          ceph1               running (12d)     2s ago  12d
   10.0M        -  15.2.17  93146564743f  0a009254afb0
crash.ceph2          ceph2               running (12d)     3s ago  12d
   10.0M        -  15.2.17  93146564743f  0a009254afb0
grafana.ceph1        ceph1  *:3000       running (18s)     2s ago  19s
   47.9M        -  8.3.5    dad864ee21e9  7d7a70b8ab7f
grafana.ceph1        ceph2  *:3000       running (18s)     3s ago  19s
   47.9M        -  8.3.5    dad864ee21e9  7d7a70b8ab7f
mgr.ceph2.hmbdla     ceph1               running (13h)     2s ago  12d
    506M        -  16.2.10  0d668911f040  6274723c35f7
mgr.ceph2.hmbdla     ceph2               running (13h)     3s ago  12d
    506M        -  16.2.10  0d668911f040  6274723c35f7
node-exporter.ceph2  ceph1               running (91m)     2s ago  12d
   60.7M        -  0.18.1   e5a616e4b9cf  d0ba04bb977c
node-exporter.ceph2  ceph2               running (91m)     3s ago  12d
   60.7M        -  0.18.1   e5a616e4b9cf  d0ba04bb977c
osd.2                ceph1               running (12h)     2s ago  12d
    867M    4096M  15.2.17  93146564743f  e286fb1c6302
osd.2                ceph2               running (12h)     3s ago  12d
    867M    4096M  15.2.17  93146564743f  e286fb1c6302
osd.3                ceph1               running (12h)     2s ago  12d
    978M    4096M  15.2.17  93146564743f  d3ae5d9f694f
osd.3                ceph2               running (12h)     3s ago  12d
    978M    4096M  15.2.17  93146564743f  d3ae5d9f694f
osd.5                ceph1               running (12h)     2s ago   8d
    225M    4096M  15.2.17  93146564743f  405068fb474e
osd.5                ceph2               running (12h)     3s ago   8d
    225M    4096M  15.2.17  93146564743f  405068fb474e
prometheus.ceph1     ceph1  *:9095       running (8s)      2s ago   8s
   30.4M        -           514e6a882f6e  9031dbe30cae
prometheus.ceph1     ceph2  *:9095       running (8s)      3s ago   8s
   30.4M        -           514e6a882f6e  9031dbe30cae

Is this a bug or did I do something wrong? any workaround to get out
from this condition?
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx