Re: Change OSD Address after IB/Ethernet switch

Dominik Baack <dominik.baack@xxxxxxxxxxxxxxxxxx> · Mon, 19 Dec 2022 20:23:21 +0100

Thank you very much,

manual changing of all config files got us back in production again.
Recovery should be done in the next hours.

cluster:
    id:     10489760-1723-11ec-8050-cb54d51756be
    health: HEALTH_WARN
            545 pgs not deep-scrubbed in time
            545 pgs not scrubbed in time

  services:
    mon: 4 daemons, quorum ml2rsn06,ml2rsn03,ml2rsn05,ml2rsn07 (age 9m)
    mgr: ml2rsn05.rivwqx(active, since 17m), standbys: ml2rsn03.ufxzjh
    mds: 1/1 daemons up, 2 standby
    osd: 36 osds: 36 up (since 8m), 36 in (since 8m); 101 remapped pgs

  data:
    volumes: 1/1 healthy
    pools:   3 pools, 545 pgs
    objects: 58.64M objects, 73 TiB
    usage:   221 TiB used, 282 TiB / 503 TiB avail
    pgs:     11654543/175922865 objects misplaced (6.625%)
             444 active+clean
             101 active+remapped+backfilling

  io:
    recovery: 2.3 GiB/s, 7.17k keys/s, 1.91k objects/s

Cheers
Dominik

Am 19.12.2022 um 19:49 schrieb Eugen Block:
Maybe in this case you actually should update the local config 
(/var/lib/ceph/<FSID>/<SERVICE>/config) to reflect the new MONs. It 
seems that adding the MONs with their new address didn't update the 
local ceph.conf for some reason, but try it manually and restart the 
OSD services. I didn't have to go through these steps in a while (with 
cephadm only once in a test cluster), not sure what we could be 
missing here.

Zitat von Dominik Baack <dominik.baack@xxxxxxxxxxxxxxxxxx>:

Hi,

OSD (9 SSDs) and Mons are currently on the same nodes with 2 
dedicated 200GBe Cards and a third out of band 10GBe connection.

I changed public network to
public_network 129.217.31.176/28

in between for crosscheck but did it now again to capture the steps 
and output, sadly no visible changes.

I do the operations in the following order:

./cephadm shell
ceph config dump
ceph config set global public_network 129.217.31.176/28
ceph config set global cluster_network 129.217.31.184/29
ceph config dump # check output
<-
restart ceph.target on all nodes
ceph orch daemon reconfig osd.28

Check osd.28 containers ceph.conf :

[root@ml2rsn05 /]# cat /etc/ceph/ceph.conf
# minimal ceph.conf for ...
[global]
        fsid = ...
        mon_host = 
[v2:129.217.31.171:3300/0,v1:129.217.31.171:6789/0] 
[v2:129.217.31.172:3300/0,v1:129.217.31.172:6789/0] 
[v2:129.217.31.175:3300/0,v1:129.217.31.175:6789/0]

Cheers
Dominik

Am 19.12.2022 um 18:55 schrieb Eugen Block:
I just looked at the previous mails again, is it possible that you 
mixed up public and cluster network? MONs require only access to the 
public network, except if they are colocated with OSDs, of course. 
You stated this earlier:

I could move the mons over to the new address range and they 
connect into the cluster network. The OSD create more of a problem, 
even after setting
public_network 129.217.31.176/29
cluster_network 129.217.31.184/29

and this:

mon_host = [v2:129.217.31.186:3300/0,v1:129.217.31.186:6789/0] 
[v2:129.217.31.188:3300/0,v1:129.217.31.188:6789/0] 
[v2:129.217.31.189:3300/0,v1:129.217.31.189:6789/0] 
[v2:129.217.31.190:3300/0,v1:129.217.31.190:6789/0]

The MONs addresses are from the cluster network but should be from 
the public network. If 129.217.31.184/29 is supposed to be the 
public network you should modify both networks and restart OSD 
services.

Zitat von Dominik Baack <dominik.baack@xxxxxxxxxxxxxxxxxx>:

Hi,

Reconfiguration

ceph orch daemon reconfig osd.28
Scheduled to reconfig osd.28 on host 'ml2rsn05'

cephadm ['--image', 
'quay.io/ceph/ceph@sha256:12a0a4f43413fd97a14a3d47a3451b2d2df50020835bb93db666209f3f77617a', 
'deploy', '--fsid', '10489760-1723-11ec-8050-cb54d51756be', 
'--name', 'osd.28', '--meta-json', '{"service_name": "osd", 
"ports": [], "ip": null, "deployed_by": 
["quay.io/ceph/ceph@sha256:12a0a4f43413fd97a14a3d47a3451b2d2df50020835bb93db666209f3f77617a"], 
"rank": null, "rank_generation": null, "extra_container_args": 
null}', '--config-json', '-', '--osd-fsid', 
'0c15ce43-ed2d-4348-88b2-785c25159894', '--reconfig']
2022-12-19 16:48:23,502 7f1cb6f4c740 DEBUG Acquiring lock 
139761289301680 on 
/run/cephadm/10489760-1723-11ec-8050-cb54d51756be.lock
2022-12-19 16:48:23,502 7f1cb6f4c740 DEBUG Lock 139761289301680 
acquired on /run/cephadm/10489760-1723-11ec-8050-cb54d51756be.lock
2022-12-19 16:48:23,513 7f1cb6f4c740 DEBUG systemctl: enabled
2022-12-19 16:48:23,523 7f1cb6f4c740 DEBUG systemctl: active
2022-12-19 16:48:23,524 7f1cb6f4c740 INFO Reconfig daemon osd.28 ...
2022-12-19 16:48:23,714 7f1cb6f4c740 DEBUG stat: 167 167
2022-12-19 16:48:23,777 7f1cb6f4c740 DEBUG firewalld does not 
appear to be present
2022-12-19 16:48:23,777 7f1cb6f4c740 DEBUG Not possible to enable 
service <osd>. firewalld.service is not available
2022-12-19 16:49:24,592 7f9a07c9f740 DEBUG 
--------------------------------------------------------------------------------

seems to be applied but has no effect on the ceph.conf file present 
in the osd's container.

Cheers
Dominik

Am 19.12.2022 um 17:23 schrieb Eugen Block:
ceph orch daemon reconfig osd.<ID>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx