HEALTH_WARN active+degraded on fresh install CENTOS 6.5

greg@xxxxxxxxxxx (Gregory Farnum) · Tue, 1 Jul 2014 11:45:48 -0700

On Tue, Jul 1, 2014 at 11:33 AM, Brian Lovett
<brian.lovett at prosperent.com> wrote:
> Brian Lovett <brian.lovett at ...> writes:
>
>
> I restarted all of the osd's and noticed that ceph shows 2 osd's up even if
> the servers are completely powered down:  osdmap e95: 8 osds: 2 up, 8 in
>
> Why would that be?

The OSDs report each other down much more quickly (~30s) than the
monitor timeout (~15 minutes). They'd get marked down eventually.

On Tue, Jul 1, 2014 at 11:43 AM, Brian Lovett
<brian.lovett at prosperent.com> wrote:
> Gregory Farnum <greg at ...> writes:
>
>>
>> What's the output of "ceph osd map"?
>>
>> Your CRUSH map probably isn't trying to segregate properly, with 2
>> hosts and 4 OSDs each.
>> Software Engineer #42  <at>  http://inktank.com | http://ceph.com
>>
>  Is this what you are looking for?
>
> ceph osd map rbd ceph
> osdmap e104 pool 'rbd' (2) object 'ceph' -> pg 2.3482c180 (2.0) -> up ([3,5],
> p3) acting ([3,5,0], p3)

Whoops, I mean "ceph osd list", sorry! (That should output a textual
representation of how they're arranged in the CRUSH map.)

>
> We're bringing on a 3rd host tomorrow with 4 more osd's. Would this correct
> the issue?

There's a good chance, but you're seeing a lot more degraded PGs than
one normally does when it's just a mapping failure, so I'd like to see
a few more details. :)
-Greg