Re: PG status is "active+undersized+degraded"

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,


On 06/22/2018 08:06 AM, Dave.Chen@xxxxxxxx wrote:
I saw these statement from this link ( http://docs.ceph.com/docs/master/rados/operations/crush-map/ ), it that the reason which leads to the warning?

" This, combined with the default CRUSH failure domain, ensures that replicas or erasure code shards are separated across hosts and a single host failure will not affect availability."

Best Regards,
Dave Chen

-----Original Message-----
From: Chen2, Dave
Sent: Friday, June 22, 2018 1:59 PM
To: 'Burkhard Linke'; ceph-users@xxxxxxxxxxxxxx
Cc: Chen2, Dave
Subject: RE:  PG status is "active+undersized+degraded"

Hi Burkhard,

Thanks for your explanation, I created an new OSD with 2TB from another node, it truly solved the issue, the status of Ceph cluster is " health HEALTH_OK" now.

Another question is if three homogeneous OSD is spread across 2 nodes, I still got the warning message, and  the status is "active+undersized+degraded",  so does the three OSD spread across 3 nodes are mandatory rules for Ceph? Is that only for the HA consideration? Any official documents from Ceph has some guide on this?

The default ceph crush rules try to distribute PG replicates among hosts. With a default replication number of 3 (pool size = 3), this requires at least three hosts. The pool also defines a minimum number of PG replicates to be available for allowing I/O to a PG. This is usually set to 2 (pool min size = 2). The above status thus means that there are enough copies for the min size (-> active), but not enough for the size (-> undersized + degraded).

Using less than three hosts requires changing the pool size to 2. But this is strongly discouraged, since a sane automatic recovery of data in case of a netsplit or other temporary node failure is not possible. Do not do this in a production setup.

For a production setup you should also consider node failures. The default setup uses 3 replicates, so to allow a node failure, you need 4 hosts. Otherwise the self healing feature of ceph cannot recover the third replicate. You also need to closely monitor your cluster's free space to avoid a full cluster due to replicated PGs in case of a node failure.

Regards,
Burkhard
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux