Degraded data redundancy: NUM pgs undersized

Jörg Kastning <joerg.kastning@xxxxxxxxxxxxxxxx> · Tue, 04 Sep 2018 09:47:03 +0200

Good morning folks,

As a newbie to Ceph yesterday was the first time I've configured my 
CRUSH map, added a CRUSH rule and created my first pool using this rule.

Since then I get the status HEALTH_WARN with the following output:

~~~
$ sudo ceph status
  cluster:
    id:     47c108bd-db66-4197-96df-cadde9e9eb45
    health: HEALTH_WARN
            Degraded data redundancy: 128 pgs undersized
            1 pools have pg_num > pgp_num

  services:
    mon: 3 daemons, quorum ccp-tcnm01,ccp-tcnm02,ccp-tcnm03
    mgr: ccp-tcnm01(active), standbys: ccp-tcnm03, ccp-tcnm02
    osd: 3 osds: 3 up, 3 in

  data:
    pools:   1 pools, 128 pgs
    objects: 0 objects, 0 bytes
    usage:   3088 MB used, 3068 GB / 3071 GB avail
    pgs:     128 active+undersized
~~~

The pool was created running `sudo ceph osd pool create joergsfirstpool 
128 replicated replicate_datacenter`.

I've figured out that I forgot to set the value for the key pgp_num 
accordingly. So I've done that by running `sudo ceph osd pool set 
joergsfirstpool pgp_num 128`. As you could see in the following output 
15 PGs were remapped but 113 still remain in active+undersized.

~~~
$ sudo ceph status
  cluster:
    id:     47c108bd-db66-4197-96df-cadde9e9eb45
    health: HEALTH_WARN
            Degraded data redundancy: 113 pgs undersized

  services:
    mon: 3 daemons, quorum ccp-tcnm01,ccp-tcnm02,ccp-tcnm03
    mgr: ccp-tcnm01(active), standbys: ccp-tcnm03, ccp-tcnm02
    osd: 3 osds: 3 up, 3 in; 15 remapped pgs

  data:
    pools:   1 pools, 128 pgs
    objects: 0 objects, 0 bytes
    usage:   3089 MB used, 3068 GB / 3071 GB avail
    pgs:     113 active+undersized
             15  active+clean+remapped
~~~

My questions are:

 1. What does active+undersized actually mean? I did not find anything 
about it in the documentation on docs.ceph.com.

 2. Why are only 15 PGs were getting remapped after I've corrected the 
mistake with the wrong pgp_num value?

 3. What's wrong here and what do I have to do to get the cluster back 
to active+clean, again?

For further information you could find my current CRUSH map below:

# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
tunable chooseleaf_vary_r 1
tunable chooseleaf_stable 1
tunable straw_calc_version 1
tunable allowed_bucket_algs 54

# devices
device 0 osd.0 class hdd
device 1 osd.1 class hdd
device 2 osd.2 class hdd

# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 region
type 10 root

# buckets
host ccp-tcnm01 {
	id -5		# do not change unnecessarily
	id -6 class hdd		# do not change unnecessarily
	# weight 1.000
	alg straw2
	hash 0	# rjenkins1
	item osd.1 weight 1.000
}
host ccp-tcnm03 {
	id -7		# do not change unnecessarily
	id -8 class hdd		# do not change unnecessarily
	# weight 1.000
	alg straw2
	hash 0	# rjenkins1
	item osd.2 weight 1.000
}
datacenter dc1 {
	id -9		# do not change unnecessarily
	id -12 class hdd		# do not change unnecessarily
	# weight 2.000
	alg straw2
	hash 0	# rjenkins1
	item ccp-tcnm01 weight 1.000
	item ccp-tcnm03 weight 1.000
}
host ccp-tcnm02 {
	id -3		# do not change unnecessarily
	id -4 class hdd		# do not change unnecessarily
	# weight 1.000
	alg straw2
	hash 0	# rjenkins1
	item osd.0 weight 1.000
}
datacenter dc3 {
	id -10		# do not change unnecessarily
	id -11 class hdd		# do not change unnecessarily
	# weight 1.000
	alg straw2
	hash 0	# rjenkins1
	item ccp-tcnm02 weight 1.000
}
root default {
	id -1		# do not change unnecessarily
	id -2 class hdd		# do not change unnecessarily
	# weight 3.000
	alg straw2
	hash 0	# rjenkins1
	item dc1 weight 2.000
	item dc3 weight 1.000
}

# rules
rule replicated_rule {
	id 0
	type replicated
	min_size 1
	max_size 10
	step take default
	step chooseleaf firstn 0 type host
	step emit
}
rule replicate_datacenter {
	id 1
	type replicated
	min_size 1
	max_size 10
	step take default
	step chooseleaf firstn 0 type datacenter
	step emit
}

# end crush map

Best regards,
Joerg

Attachment:
smime.p7s

Description: S/MIME Cryptographic Signature
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com