Re: PG status is "active+undersized+degraded"

<Dave.Chen@xxxxxxxx> · Fri, 22 Jun 2018 06:06:06 +0000

I saw these statement from this link ( http://docs.ceph.com/docs/master/rados/operations/crush-map/ ), it that the reason which leads to the warning?  

" This, combined with the default CRUSH failure domain, ensures that replicas or erasure code shards are separated across hosts and a single host failure will not affect availability."

Best Regards,
Dave Chen

-----Original Message-----
From: Chen2, Dave 
Sent: Friday, June 22, 2018 1:59 PM
To: 'Burkhard Linke'; ceph-users@xxxxxxxxxxxxxx
Cc: Chen2, Dave
Subject: RE:  PG status is "active+undersized+degraded"

Hi Burkhard,

Thanks for your explanation, I created an new OSD with 2TB from another node, it truly solved the issue, the status of Ceph cluster is " health HEALTH_OK" now.

Another question is if three homogeneous OSD is spread across 2 nodes, I still got the warning message, and  the status is "active+undersized+degraded",  so does the three OSD spread across 3 nodes are mandatory rules for Ceph? Is that only for the HA consideration? Any official documents from Ceph has some guide on this?

$ ceph osd tree
ID WEIGHT  TYPE NAME      UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 7.25439 root default
-2 1.81360     host ceph3
 2 1.81360         osd.2       up  1.00000          1.00000
-4 3.62720     host ceph1
 0 1.81360         osd.0       up  1.00000          1.00000
 1 1.81360         osd.1       up  1.00000          1.00000

Best Regards,
Dave Chen

-----Original Message-----
From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of Burkhard Linke
Sent: Thursday, June 21, 2018 2:39 PM
To: ceph-users@xxxxxxxxxxxxxx
Subject: Re:  PG status is "active+undersized+degraded"

Hi,

On 06/21/2018 05:14 AM, Dave.Chen@xxxxxxxx wrote:
> Hi all,
>
> I have setup a ceph cluster in my lab recently, the configuration per my understanding should be okay, 4 OSD across 3 nodes, 3 replicas, but couple of PG stuck with state "active+undersized+degraded", I think this should be very generic issue, could anyone help me out?
>
> Here is the details about the ceph cluster,
>
> $ ceph -v          (jewel)
> ceph version 10.2.10 (5dc1e4c05cb68dbf62ae6fce3f0700e4654fdbbe)
>
> # ceph osd tree
> ID WEIGHT  TYPE NAME      UP/DOWN REWEIGHT PRIMARY-AFFINITY
> -1 5.89049 root default
> -2 1.81360     host ceph3
> 2 1.81360         osd.2       up  1.00000          1.00000
> -3 0.44969     host ceph4
> 3 0.44969         osd.3       up  1.00000          1.00000
> -4 3.62720     host ceph1
> 0 1.81360         osd.0       up  1.00000          1.00000
> 1 1.81360         osd.1       up  1.00000          1.00000

*snipsnap*

You have a large difference in the capacities of the nodes. This results in a different host weight, which in turn might lead to problems with the crush algorithm. It is not able to get three different hosts for OSD placement for some of the PGs.

CEPH and crush do not cope well with heterogenous setups. I would suggest to move one of the OSDs from host ceph1 to ceph4 to equalize the host weight.

Regards,
Burkhard
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com