How to replace an node in ceph?

chibi@xxxxxxx (Christian Balzer) · Fri, 5 Sep 2014 15:36:23 +0900

On Fri, 5 Sep 2014 13:46:17 +0800 Ding Dinghua wrote:

> 2014-09-05 13:19 GMT+08:00 Christian Balzer <chibi at gol.com>:
> 
> >
> > Hello,
> >
> > On Fri, 5 Sep 2014 12:09:11 +0800 Ding Dinghua wrote:
> >
> > > Please see my comment below:
> > >
> > >
> > > 2014-09-04 21:33 GMT+08:00 Christian Balzer <chibi at gol.com>:
> > >
> > > >
> > > > Hello,
> > > >
> > > > On Thu, 4 Sep 2014 20:56:31 +0800 Ding Dinghua wrote:
> > > >
> > > > Aside from what Loic wrote, why not replace the network controller
> > > > or if it is onboard, add a card?
> > > >
> > > > > Hi all,
> > > > >         I'm new to ceph, and apologize if the question has been
> > > > > asked.
> > > > >
> > > > >         I have setup a 8-nodes ceph cluster, and after two months
> > > > > running, network controller of an node is broken, so I have to
> > > > > replace the node with an new one.
> > > > >         I don't want to trigger data migration, since all I want
> > > > > to do is replacing a node, not shrink the cluster and then
> > > > > enlarge the cluster.
> > > >
> > > > Well, you will have (had) data migration unless your cluster was
> > > > set to noout from the start or had a "mon osd downout subtree
> > > > limit" set accordingly.
> > > >
> > >   [Ding Dinghua]: Yes, I have already set noout flag
> > >
> > > >
> > > > >         I think the following steps may work:
> > > > >         1)  set osd_crush_update_on_start to false, so when osd
> > > > > starts, it won't modify crushmap and trigger data migration.
> > > > I think the noin flag might do that trick, too.
> > > >
> > >  [Ding Dinghua]: I set osd_crush_update_on_start to false, so when
> > > the osds on the new node start,
> > >                         /etc/init.d/ceph script won't do "ceph osd
> > > crush create-or-move", and the osds on new node will still in the
> > > old host, so no data migration will occur.
> > >
> > > > >           2)  set noout flags to prevent osds been kicked out of
> > > > > cluster and trigger data migration
> > > > Probably too late at this point...
> > > >
> > > > >           3)  mark all osds on the broken node down(actually,
> > > > > since network controller is broken, these osds are already down)
> > > > And not out?
> > > >
> > >   [Ding Dinghua]: Yes, since noout flag is set, these osds are
> > > [down, in]
> > >
> > So far, so good.
> >
> > However see below:
> >
> > > >
> > > > Regards,
> > > >
> > > > Christian
> > > > >           4)  prepare osd on the new node, and keep osd_num the
> > > > > same with the osd on the broken node:
> > > > >                ceph-osd -i [osd_num] --osd-data=path1 --mkfs
> >
> > I don't think that will work. To recycle OSDs they would have to be
> > removed (triggering migration) first.
> >
>  [Ding Dinghua]:  I'm not removing/creating osds,  just change the
> storage which holds the osds, so osd id on the broken node should be
> used.
> 
Oh, I see.  So you're moving the OSD disks to the new machine.
In that case, why not the OS disks as well?

Then start it up with the network disconnected (or otherwise prevent the
OSDs from being started, a tricky endeavor with those pesky udev rules),
re-connect the network, ntpdate ^o^, bring up one OSD at a time as you
intended below.

Christian

> > Just adding new OSDs should do the trick, though.
> >
> [Ding Dinghua]: It will trigger data migration
> 
> >
> > Christian
> >
> > > > >         5) start osd on the new node, and peering and backfilling
> > > > > work will be started automaticlly
> > > > >         6)  wait until 5) complete, and repeat 4) and 5) until
> > > > > all osds on the broken node been moved to the new node
> > > > >         I have done some test on my test cluster, and it seemed
> > > > > works, but I'm not quite sure it's right in theory, so any
> > > > > comments will be appreciated.
> > > > >         Thanks.
> > > > >
> > > >
> > > >
> > > > --
> > > > Christian Balzer        Network/Systems Engineer
> > > > chibi at gol.com           Global OnLine Japan/Fusion Communications
> > > > http://www.gol.com/
> > > >
> > >
> > >
> > >
> >
> >
> > --
> > Christian Balzer        Network/Systems Engineer
> > chibi at gol.com           Global OnLine Japan/Fusion Communications
> > http://www.gol.com/
> >
> 
> 
> 

-- 
Christian Balzer        Network/Systems Engineer                
chibi at gol.com   	Global OnLine Japan/Fusion Communications
http://www.gol.com/