Re: Change OSD Address after IB/Ethernet switch

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Can you try to reconfig one OSD first with

ceph orch daemon reconfig osd.<ID>

and see if the OSD starts with the new ceph.conf? If that works you probably could go ahead and reconfigure all OSDs with 'ceph orch reconfig osd', or try the single OSD command for a couple more OSDs first.

Zitat von Dominik Baack <dominik.baack@xxxxxxxxxxxxxxxxxx>:

Hi,

I attached to the osd container itself and found that ceph.conf there is still wrong.

A test cluster is at the moment difficult because most of the utility nodes are still in transition, we wanted to get data back up and running first.

# minimal ceph.conf for 10489760
[global]
        fsid = 10489760
        mon_host = [v2:129.217.31.171:3300/0,v1:129.217.31.171:6789/0] [v2:129.217.31.172:3300/0,v1:129.217.31.172:6789/0] [v2:129.217.31.175:3300/0,v1:129.217.31.175:6789/0]


Should I go manually through all osd container and change the entries in the conf?


Cheers
Dominik


Am 19.12.2022 um 15:49 schrieb Dominik Baack:
Hi,

ceph.conf on the host as well as in "cephadm shell" are similar and have the correct IPs set. Entries for osd are not present.

# minimal ceph.conf for 10489760
[global]
        fsid = 10489760
        mon_host = [v2:129.217.31.186:3300/0,v1:129.217.31.186:6789/0] [v2:129.217.31.188:3300/0,v1:129.217.31.188:6789/0] [v2:129.217.31.189:3300/0,v1:129.217.31.189:6789/0] [v2:129.217.31.190:3300/0,v1:129.217.31.190:6789/0]


Ceph health detail does not help much either, only that there is an "sn05" artifact still present somewhere that should be removed:


Cheers
Dominik


HEALTH_WARN 1 filesystem is degraded; 1 MDSs report slow metadata IOs; 16 osds down; 3 hosts (18 osds) down; Reduced data availability: 545 pgs inactive
[WRN] FS_DEGRADED: 1 filesystem is degraded
    fs ml2r_storage is degraded
[WRN] MDS_SLOW_METADATA_IO: 1 MDSs report slow metadata IOs
    mds.ml2r_storage.ml2rsn06.clizsn(mds.0): 4 slow metadata IOs are blocked > 30 secs, oldest blocked for 4951 secs
[WRN] OSD_DOWN: 16 osds down
    osd.28 (root=default,host=ml2rsn05) is down
    osd.29 (root=default,host=ml2rsn05) is down
    osd.30 (root=default,host=ml2rsn05) is down
    osd.31 (root=default,host=ml2rsn05) is down
    osd.32 (root=default,host=ml2rsn05) is down
    osd.33 (root=default,host=ml2rsn05) is down
    osd.34 (root=default,host=ml2rsn05) is down
    osd.35 (root=default,host=ml2rsn05) is down
    osd.36 (root=default,host=ml2rsn06) is down
    osd.37 (root=default,host=ml2rsn06) is down
    osd.38 (root=default,host=ml2rsn06) is down
    osd.39 (root=default,host=ml2rsn06) is down
    osd.40 (root=default,host=ml2rsn06) is down
    osd.41 (root=default,host=ml2rsn06) is down
    osd.42 (root=default,host=ml2rsn06) is down
    osd.43 (root=default,host=ml2rsn06) is down
[WRN] OSD_HOST_DOWN: 3 hosts (18 osds) down
    host ml2rsn03 (root=default) (9 osds) is down
    host ml2rsn05 (root=default) (9 osds) is down
    host sn05 (root=default) (0 osds) is down
[WRN] PG_AVAILABILITY: Reduced data availability: 545 pgs inactive


Am 19.12.2022 um 14:02 schrieb Eugen Block:
I re-read a thread from last year [1] but didn't find anything extraordinary required for this to work. If your MONs already run with the new addresses, does the ceph.conf on the OSD nodes still reflect the old entries? The docs do mention it explicitly:

Ceph clients and other Ceph daemons use ceph.conf to discover monitors.

In that case a restart of ceph.target wouldn't be enough. I assume cephadm should update the conf file but maybe it requires a 'ceph orch reconfigure osd'? I haven't tried that yet, so be careful and test the effects in a test cluster.

Regards,
Eugen

[1] https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/OWSIXB5O3Y7LXC3D2JYETEAFMRC3K7OY/

Zitat von Dominik Baack <dominik.baack@xxxxxxxxxxxxxxxxxx>:

Hi,

min_mon_release 17 (quincy)
election_strategy: 1
0: [v2:129.217.31.189:3300/0,v1:129.217.31.189:6789/0] mon.ml2rsn06
1: [v2:129.217.31.186:3300/0,v1:129.217.31.186:6789/0] mon.ml2rsn03
2: [v2:129.217.31.188:3300/0,v1:129.217.31.188:6789/0] mon.ml2rsn05
3: [v2:129.217.31.190:3300/0,v1:129.217.31.190:6789/0] mon.ml2rsn07
dumped monmap epoch 21

seems fine for me (with the exception that there is an even number of mons).

ceph osd metadata 18

 "back_addr": "[v2:129.217.31.173:6817/1187092418,v1:129.217.31.173:6819/1187092418]",

Still shows an old address after restarting of ceph.target.


Cheers
Dominik

Am 19.12.2022 um 09:24 schrieb Eugen Block:
If you did it "the right way" then 'ceph mon dump' should reflect the changes. Does that output show the new IP addresses?


Zitat von Dominik Baack <dominik.baack@xxxxxxxxxxxxxxxxxx>:

Hi,

I removed/added the new mons as explained in:

https://docs.ceph.com/en/latest/rados/operations/add-or-rm-mons/#changing-a-monitor-s-ip-address-the-right-way but did not change the mon map manually.


I can retry the manual way as well and report afterwards:

https://docs.ceph.com/en/latest/rados/operations/add-or-rm-mons/#changing-a-monitor-s-ip-address-the-messy-way Cheers
Dominik Baack


Am 17.12.2022 um 10:06 schrieb Eugen Block:
Hi,

did you also change the monmap as described in the docs [1]? There have been multiple threads in this list with the same topic. Simply changing the IP address of the MONs is not sufficient, but after fixing the monmap the OSDs should connect successfully.

[1] https://docs.ceph.com/en/quincy/rados/operations/add-or-rm-mons.html

Zitat von Dominik Baack <dominik.baack@xxxxxxxxxxxxxxxxxx>:

Hi,

we have switched our network from IP over IB to an Ethernet configuration.

I could move the mons over to the new address range and they connect into the cluster network. The OSD create more of a problem, even after setting

public_network129.217.31.176/29
cluster_network 129.217.31.184/29

to their values and restarting ceph.target on all nodes but they still report the wrong information when checked with ceph osd metadata :

back_addr
front_addr
hb_back_addr
hb_front_addr

and are therefore not able to connect.

Where do I need to modify the address to get them connecting again?

Cheers
Dominik

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux