On Fri, 20 Nov 2015, Wei-Chung Cheng wrote: > Hi Loic and cephers, > > Sure, I have time to help (comment) on this feature replace a disk. > This is a useful feature to handle disk failure :p > > An simple step is described on http://tracker.ceph.com/issues/13732 : > 1. set noout flag - if the broken osd is primary osd, could we handle well? > 2. stop osd daemon and we need to wait the osd actually down. (or > maybe use deactivate option with ceph-disk) > > these two above step seems OK. > about handle crush map, should we remove the broken osd out? > If we do that, why we set noout flag? It still trigger re-balance > after we remove osd from crushmap. Right--I think you generally want to do either one or the other: 1) mark osd out, leave failed disk in place. or, replace with new disk that re-uses the same osd id. or, 2) remove osd from crush map. replace with new disk (which gets new osd id). I think re-using the osd id is awkward currently, so doing 1 and replacing the disk ends up moving data twice. sage -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html