Temporary degradation when adding OSD's

greg@xxxxxxxxxxx (Gregory Farnum) · Mon, 7 Jul 2014 11:25:16 -0700

On Mon, Jul 7, 2014 at 7:03 AM, Erik Logtenberg <erik at logtenberg.eu> wrote:
> Hi,
>
> If you add an OSD to an existing cluster, ceph will move some existing
> data around so the new OSD gets its respective share of usage right away.
>
> Now I noticed that during this moving around, ceph reports the relevant
> PG's as degraded. I can more or less understand the logic here: if a
> piece of data is supposed to be in a certain place (the new OSD), but it
> is not yet there, it's degraded.
>
> However I would hope that the movement of data is executed in such a way
> that first a new copy is made on the new OSD and only after successfully
> doing that, one of the existing copies is removed. If so, there is never
> actually any "degradation" of that PG.
>
> More to the point, if I have a PG replicated over three OSD's: 1, 2 and
> 3; now I add an OSD 4, and ceph decides to move the copy of OSD 3 to the
> new OSD 4; if it turns out that ceph can't read the copies on OSD 1 and
> 2 due to some disk error, I would assume that ceph would still use the
> copy that exists on OSD 3 to populate the copy on OSD 4. Is that indeed
> the case?

Yeah, Ceph will never voluntarily reduce the redundancy. I believe
splitting the "degraded" state into separate "wrongly placed" and
"degraded" (reduced redundancy) states is currently on the menu for
the Giant release, but it's not been done yet.

>
>
> I have a very similar question about removing an OSD. You can tell ceph
> to mark an OSD as "out" before physically removing it. The OSD is still
> "up" but ceph will no longer assign PG's to it, and will make new copies
> of the PG's that are on this OSD to other OSD's.
> Now again ceph will report degradation, even though the "out" OSD is
> still "up", so the existing copies are not actually lost. Does ceph use
> the OSD that is marked "out" as a source for making the new copies on
> other OSD's?

Yep!
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com