Re: Ceph re-ip of OSD node

Jake Young <jak3kaj@xxxxxxxxx> · Wed, 30 Aug 2017 21:37:42 +0000

Hey Ben,

Take a look at the osd log for another OSD who's ip you did not change. 

What errors does it show related the re-ip'd OSD?

Is the other OSD trying to communicate with the re-ip'd OSD's old ip address?

Jake 

On Wed, Aug 30, 2017 at 3:55 PM Jeremy Hanmer <jeremy.hanmer@xxxxxxxxxxxxx> wrote:
This is simply not true. We run quite a few ceph clusters with

rack-level layer2 domains (thus routing between racks) and everything

works great.

On Wed, Aug 30, 2017 at 10:52 AM, David Turner <drakonstein@xxxxxxxxx> wrote:

> ALL OSDs need to be running the same private network at the same time.  ALL

> clients, RGW, OSD, MON, MGR, MDS, etc, etc need to be running on the same

> public network at the same time.  You cannot do this as a one at a time

> migration to the new IP space.  Even if all of the servers can still

> communicate via routing, it just won't work.  Changing the public/private

> network addresses for a cluster requires full cluster down time.

>

> On Wed, Aug 30, 2017 at 11:09 AM Ben Morrice <ben.morrice@xxxxxxx> wrote:

>>

>> Hello

>>

>> We have a small cluster that we need to move to a different network in

>> the same datacentre.

>>

>> My workflow was the following (for a single OSD host), but I failed

>> (further details below)

>>

>> 1) ceph osd set noout

>> 2) stop ceph-osd processes

>> 3) change IP, gateway, domain (short hostname is the same), VLAN

>> 4) change references of OLD IP (cluster and public network) in

>> /etc/ceph/ceph.conf with NEW IP (see [1])

>> 5) start a single OSD process

>>

>> This seems to work as the NEW IP can communicate with mon hosts and osd

>> hosts on the OLD network, the OSD is booted and is visible via 'ceph -w'

>> however after a few seconds the OSD drops with messages such as the

>> below in it's log file

>>

>> heartbeat_check: no reply from 10.1.1.100:6818 osd.14 ever on either

>> front or back, first ping sent 2017-08-30 16:42:14.692210 (cutoff

>> 2017-08-30 16:42:24.962245)

>>

>> There are logs like the above for every OSD server/process

>>

>> and then eventually a

>>

>> 2017-08-30 16:42:14.486275 7f6d2c966700  0 log_channel(cluster) log

>> [WRN] : map e85351 wrongly marked me down

>>

>>

>> Am I missing something obvious to reconfigure the network on a OSD host?

>>

>>

>>

>> [1]

>>

>> OLD

>> [osd.0]

>>     host = sn01

>>     devs = /dev/sdi

>>     cluster addr = 10.1.1.101

>>     public addr = 10.1.1.101

>> NEW

>> [osd.0]

>>     host = sn01

>>     devs = /dev/sdi

>>     cluster addr = 10.1.2.101

>>     public addr = 10.1.2.101

>>

>> --

>> Kind regards,

>>

>> Ben Morrice

>>

>> ______________________________________________________________________

>> Ben Morrice | e: ben.morrice@xxxxxxx | t: +41-21-693-9670

>> EPFL / BBP

>> Biotech Campus

>> Chemin des Mines 9

>> 1202 Geneva

>> Switzerland

>>

>> _______________________________________________

>> ceph-users mailing list

>> ceph-users@xxxxxxxxxxxxxx

>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

>

>

> _______________________________________________

> ceph-users mailing list

> ceph-users@xxxxxxxxxxxxxx

> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

>

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com