Re: Ceph re-ip of OSD node

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hey Ben,

Take a look at the osd log for another OSD who's ip you did not change. 

What errors does it show related the re-ip'd OSD?

Is the other OSD trying to communicate with the re-ip'd OSD's old ip address?

Jake 


On Wed, Aug 30, 2017 at 3:55 PM Jeremy Hanmer <jeremy.hanmer@xxxxxxxxxxxxx> wrote:
This is simply not true. We run quite a few ceph clusters with
rack-level layer2 domains (thus routing between racks) and everything
works great.

On Wed, Aug 30, 2017 at 10:52 AM, David Turner <drakonstein@xxxxxxxxx> wrote:
> ALL OSDs need to be running the same private network at the same time.  ALL
> clients, RGW, OSD, MON, MGR, MDS, etc, etc need to be running on the same
> public network at the same time.  You cannot do this as a one at a time
> migration to the new IP space.  Even if all of the servers can still
> communicate via routing, it just won't work.  Changing the public/private
> network addresses for a cluster requires full cluster down time.
>
> On Wed, Aug 30, 2017 at 11:09 AM Ben Morrice <ben.morrice@xxxxxxx> wrote:
>>
>> Hello
>>
>> We have a small cluster that we need to move to a different network in
>> the same datacentre.
>>
>> My workflow was the following (for a single OSD host), but I failed
>> (further details below)
>>
>> 1) ceph osd set noout
>> 2) stop ceph-osd processes
>> 3) change IP, gateway, domain (short hostname is the same), VLAN
>> 4) change references of OLD IP (cluster and public network) in
>> /etc/ceph/ceph.conf with NEW IP (see [1])
>> 5) start a single OSD process
>>
>> This seems to work as the NEW IP can communicate with mon hosts and osd
>> hosts on the OLD network, the OSD is booted and is visible via 'ceph -w'
>> however after a few seconds the OSD drops with messages such as the
>> below in it's log file
>>
>> heartbeat_check: no reply from 10.1.1.100:6818 osd.14 ever on either
>> front or back, first ping sent 2017-08-30 16:42:14.692210 (cutoff
>> 2017-08-30 16:42:24.962245)
>>
>> There are logs like the above for every OSD server/process
>>
>> and then eventually a
>>
>> 2017-08-30 16:42:14.486275 7f6d2c966700  0 log_channel(cluster) log
>> [WRN] : map e85351 wrongly marked me down
>>
>>
>> Am I missing something obvious to reconfigure the network on a OSD host?
>>
>>
>>
>> [1]
>>
>> OLD
>> [osd.0]
>>     host = sn01
>>     devs = /dev/sdi
>>     cluster addr = 10.1.1.101
>>     public addr = 10.1.1.101
>> NEW
>> [osd.0]
>>     host = sn01
>>     devs = /dev/sdi
>>     cluster addr = 10.1.2.101
>>     public addr = 10.1.2.101
>>
>> --
>> Kind regards,
>>
>> Ben Morrice
>>
>> ______________________________________________________________________
>> Ben Morrice | e: ben.morrice@xxxxxxx | t: +41-21-693-9670
>> EPFL / BBP
>> Biotech Campus
>> Chemin des Mines 9
>> 1202 Geneva
>> Switzerland
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux