On Mon, Jul 7, 2014 at 7:03 AM, Erik Logtenberg <erik at logtenberg.eu> wrote: > Hi, > > If you add an OSD to an existing cluster, ceph will move some existing > data around so the new OSD gets its respective share of usage right away. > > Now I noticed that during this moving around, ceph reports the relevant > PG's as degraded. I can more or less understand the logic here: if a > piece of data is supposed to be in a certain place (the new OSD), but it > is not yet there, it's degraded. > > However I would hope that the movement of data is executed in such a way > that first a new copy is made on the new OSD and only after successfully > doing that, one of the existing copies is removed. If so, there is never > actually any "degradation" of that PG. > > More to the point, if I have a PG replicated over three OSD's: 1, 2 and > 3; now I add an OSD 4, and ceph decides to move the copy of OSD 3 to the > new OSD 4; if it turns out that ceph can't read the copies on OSD 1 and > 2 due to some disk error, I would assume that ceph would still use the > copy that exists on OSD 3 to populate the copy on OSD 4. Is that indeed > the case? Yeah, Ceph will never voluntarily reduce the redundancy. I believe splitting the "degraded" state into separate "wrongly placed" and "degraded" (reduced redundancy) states is currently on the menu for the Giant release, but it's not been done yet. > > > I have a very similar question about removing an OSD. You can tell ceph > to mark an OSD as "out" before physically removing it. The OSD is still > "up" but ceph will no longer assign PG's to it, and will make new copies > of the PG's that are on this OSD to other OSD's. > Now again ceph will report degradation, even though the "out" OSD is > still "up", so the existing copies are not actually lost. Does ceph use > the OSD that is marked "out" as a source for making the new copies on > other OSD's? Yep! -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com