Re: Change OSD Address after IB/Ethernet switch

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thank you very much,

manual changing of all config files got us back in production again.
Recovery should be done in the next hours.


cluster:
    id:     10489760-1723-11ec-8050-cb54d51756be
    health: HEALTH_WARN
            545 pgs not deep-scrubbed in time
            545 pgs not scrubbed in time

  services:
    mon: 4 daemons, quorum ml2rsn06,ml2rsn03,ml2rsn05,ml2rsn07 (age 9m)
    mgr: ml2rsn05.rivwqx(active, since 17m), standbys: ml2rsn03.ufxzjh
    mds: 1/1 daemons up, 2 standby
    osd: 36 osds: 36 up (since 8m), 36 in (since 8m); 101 remapped pgs

  data:
    volumes: 1/1 healthy
    pools:   3 pools, 545 pgs
    objects: 58.64M objects, 73 TiB
    usage:   221 TiB used, 282 TiB / 503 TiB avail
    pgs:     11654543/175922865 objects misplaced (6.625%)
             444 active+clean
             101 active+remapped+backfilling

  io:
    recovery: 2.3 GiB/s, 7.17k keys/s, 1.91k objects/s


Cheers
Dominik

Am 19.12.2022 um 19:49 schrieb Eugen Block:
Maybe in this case you actually should update the local config (/var/lib/ceph/<FSID>/<SERVICE>/config) to reflect the new MONs. It seems that adding the MONs with their new address didn't update the local ceph.conf for some reason, but try it manually and restart the OSD services. I didn't have to go through these steps in a while (with cephadm only once in a test cluster), not sure what we could be missing here.


Zitat von Dominik Baack <dominik.baack@xxxxxxxxxxxxxxxxxx>:

Hi,

OSD (9 SSDs) and Mons are currently on the same nodes with 2 dedicated 200GBe Cards and a third out of band 10GBe connection.

I changed public network to
public_network 129.217.31.176/28

in between for crosscheck but did it now again to capture the steps and output, sadly no visible changes.

I do the operations in the following order:

./cephadm shell
ceph config dump
ceph config set global public_network 129.217.31.176/28
ceph config set global cluster_network 129.217.31.184/29
ceph config dump # check output
<-
restart ceph.target on all nodes
ceph orch daemon reconfig osd.28

Check osd.28 containers ceph.conf :

[root@ml2rsn05 /]# cat /etc/ceph/ceph.conf
# minimal ceph.conf for ...
[global]
        fsid = ...
        mon_host = [v2:129.217.31.171:3300/0,v1:129.217.31.171:6789/0] [v2:129.217.31.172:3300/0,v1:129.217.31.172:6789/0] [v2:129.217.31.175:3300/0,v1:129.217.31.175:6789/0]

Cheers
Dominik


Am 19.12.2022 um 18:55 schrieb Eugen Block:
I just looked at the previous mails again, is it possible that you mixed up public and cluster network? MONs require only access to the public network, except if they are colocated with OSDs, of course. You stated this earlier:

I could move the mons over to the new address range and they connect into the cluster network. The OSD create more of a problem, even after setting
public_network 129.217.31.176/29
cluster_network 129.217.31.184/29

and this:

mon_host = [v2:129.217.31.186:3300/0,v1:129.217.31.186:6789/0] [v2:129.217.31.188:3300/0,v1:129.217.31.188:6789/0] [v2:129.217.31.189:3300/0,v1:129.217.31.189:6789/0] [v2:129.217.31.190:3300/0,v1:129.217.31.190:6789/0]

The MONs addresses are from the cluster network but should be from the public network. If 129.217.31.184/29 is supposed to be the public network you should modify both networks and restart OSD services.

Zitat von Dominik Baack <dominik.baack@xxxxxxxxxxxxxxxxxx>:

Hi,

Reconfiguration

ceph orch daemon reconfig osd.28
Scheduled to reconfig osd.28 on host 'ml2rsn05'

cephadm ['--image', 'quay.io/ceph/ceph@sha256:12a0a4f43413fd97a14a3d47a3451b2d2df50020835bb93db666209f3f77617a', 'deploy', '--fsid', '10489760-1723-11ec-8050-cb54d51756be', '--name', 'osd.28', '--meta-json', '{"service_name": "osd", "ports": [], "ip": null, "deployed_by": ["quay.io/ceph/ceph@sha256:12a0a4f43413fd97a14a3d47a3451b2d2df50020835bb93db666209f3f77617a"], "rank": null, "rank_generation": null, "extra_container_args": null}', '--config-json', '-', '--osd-fsid', '0c15ce43-ed2d-4348-88b2-785c25159894', '--reconfig'] 2022-12-19 16:48:23,502 7f1cb6f4c740 DEBUG Acquiring lock 139761289301680 on /run/cephadm/10489760-1723-11ec-8050-cb54d51756be.lock 2022-12-19 16:48:23,502 7f1cb6f4c740 DEBUG Lock 139761289301680 acquired on /run/cephadm/10489760-1723-11ec-8050-cb54d51756be.lock
2022-12-19 16:48:23,513 7f1cb6f4c740 DEBUG systemctl: enabled
2022-12-19 16:48:23,523 7f1cb6f4c740 DEBUG systemctl: active
2022-12-19 16:48:23,524 7f1cb6f4c740 INFO Reconfig daemon osd.28 ...
2022-12-19 16:48:23,714 7f1cb6f4c740 DEBUG stat: 167 167
2022-12-19 16:48:23,777 7f1cb6f4c740 DEBUG firewalld does not appear to be present 2022-12-19 16:48:23,777 7f1cb6f4c740 DEBUG Not possible to enable service <osd>. firewalld.service is not available 2022-12-19 16:49:24,592 7f9a07c9f740 DEBUG --------------------------------------------------------------------------------

seems to be applied but has no effect on the ceph.conf file present in the osd's container.

Cheers
Dominik


Am 19.12.2022 um 17:23 schrieb Eugen Block:
ceph orch daemon reconfig osd.<ID>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux