Hello, On Thu, 4 Sep 2014 20:56:31 +0800 Ding Dinghua wrote: Aside from what Loic wrote, why not replace the network controller or if it is onboard, add a card? > Hi all, > I'm new to ceph, and apologize if the question has been asked. > > I have setup a 8-nodes ceph cluster, and after two months > running, network controller of an node is broken, so I have to replace > the node with an new one. > I don't want to trigger data migration, since all I want to do is > replacing a node, not shrink the cluster and then enlarge the cluster. Well, you will have (had) data migration unless your cluster was set to noout from the start or had a "mon osd downout subtree limit" set accordingly. > I think the following steps may work: > 1) set osd_crush_update_on_start to false, so when osd starts, > it won't modify crushmap and trigger data migration. I think the noin flag might do that trick, too. > 2) set noout flags to prevent osds been kicked out of cluster > and trigger data migration Probably too late at this point... > 3) mark all osds on the broken node down(actually, since > network controller is broken, these osds are already down) And not out? Regards, Christian > 4) prepare osd on the new node, and keep osd_num the same with > the osd on the broken node: > ceph-osd -i [osd_num] --osd-data=path1 --mkfs > 5) start osd on the new node, and peering and backfilling work > will be started automaticlly > 6) wait until 5) complete, and repeat 4) and 5) until all osds > on the broken node been moved to the new node > I have done some test on my test cluster, and it seemed works, > but I'm not quite sure it's right in theory, so any comments will be > appreciated. > Thanks. > -- Christian Balzer Network/Systems Engineer chibi at gol.com Global OnLine Japan/Fusion Communications http://www.gol.com/