Re: HEALTH_WARN 29 pgs degraded; 29 pgs stuck degraded; 133 pgs stuck unclean; 29 pgs stuck undersized;

Christian Balzer <chibi@xxxxxxx> · Sun, 28 Dec 2014 01:29:30 +0900

Hello,

On Sun, 28 Dec 2014 01:52:39 +1100 Jiri Kanicky wrote:

> Hi,
> 
> I just build my CEPH cluster but having problems with the health of the 
> cluster.
> 
You're not telling us the version, but it's clearly 0.87 or beyond.

> Here are few details:
> - I followed the ceph documentation.
Outdated, unfortunately.

> - I used btrfs filesystem for all OSDs
Big mistake number 1, do some research (google, ML archives).
Though not related to to  your problems.

> - I did not set "osd pool default size = 2 " as I thought that if I have 
> 2 nodes + 4 OSDs, I can leave default=3. I am not sure if this was right.
Big mistake, assumption number 2,  replications size by the default CRUSH
rule is determined by hosts. So that's your main issue here. 
Either set it to 2 or use 3 hosts.

> - I noticed that default pools "data,metadata" were not created. Only 
> "rbd" pool was created.
See outdated docs above. The majority of use cases is with RBD, so since
Giant the cephfs pools are not created by default.

> - As it was complaining that the pg_num is too low, I increased the 
> pg_num for pool rbd to 133 (400/3) and end up with "pool rbd pg_num 133 
>  > pgp_num 64".
> 
Re-read the (in this case correct) documentation.
It clearly states to round up to nearest power of 2, in your case 256.

Regards.

Christian

> Would you give me hint where I have made the mistake? (I can remove the 
> OSDs and start over if needed.)
> 
> 
> cephadmin@ceph1:/etc/ceph$ sudo ceph health
> HEALTH_WARN 29 pgs degraded; 29 pgs stuck degraded; 133 pgs stuck 
> unclean; 29 pgs stuck undersized; 29 pgs undersized; pool rbd pg_num 133 
>  > pgp_num 64
> cephadmin@ceph1:/etc/ceph$ sudo ceph status
>      cluster bce2ff4d-e03b-4b75-9b17-8a48ee4d7788
>       health HEALTH_WARN 29 pgs degraded; 29 pgs stuck degraded; 133 pgs 
> stuck unclean; 29 pgs stuck undersized; 29 pgs undersized; pool rbd 
> pg_num 133 > pgp_num 64
>       monmap e1: 2 mons at 
> {ceph1=192.168.30.21:6789/0,ceph2=192.168.30.22:6789/0}, election epoch 
> 8, quorum 0,1 ceph1,ceph2
>       osdmap e42: 4 osds: 4 up, 4 in
>        pgmap v77: 133 pgs, 1 pools, 0 bytes data, 0 objects
>              11704 kB used, 11154 GB / 11158 GB avail
>                    29 active+undersized+degraded
>                   104 active+remapped
> 
> 
> cephadmin@ceph1:/etc/ceph$ sudo ceph osd tree
> # id    weight  type name       up/down reweight
> -1      10.88   root default
> -2      5.44            host ceph1
> 0       2.72                    osd.0   up      1
> 1       2.72                    osd.1   up      1
> -3      5.44            host ceph2
> 2       2.72                    osd.2   up      1
> 3       2.72                    osd.3   up      1
> 
> 
> cephadmin@ceph1:/etc/ceph$ sudo ceph osd lspools
> 0 rbd,
> 
> cephadmin@ceph1:/etc/ceph$ cat ceph.conf
> [global]
> fsid = bce2ff4d-e03b-4b75-9b17-8a48ee4d7788
> public_network = 192.168.30.0/24
> cluster_network = 10.1.1.0/24
> mon_initial_members = ceph1, ceph2
> mon_host = 192.168.30.21,192.168.30.22
> auth_cluster_required = cephx
> auth_service_required = cephx
> auth_client_required = cephx
> filestore_xattr_use_omap = true
> 
> Thank you
> Jiri

-- 
Christian Balzer        Network/Systems Engineer                
chibi@xxxxxxx   	Global OnLine Japan/Fusion Communications
http://www.gol.com/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com