New EC pool undersized

Kyle Hutson <kylehutson@xxxxxxx> · Wed, 4 Mar 2015 14:06:27 -0600

Last night I blew away my previous ceph configuration (this environment is pre-production) and have 0.87.1 installed. I've manually edited the crushmap so it down looks like https://dpaste.de/OLEa
I currently have 144 OSDs on 8 nodes.

After increasing pg_num and pgp_num to a more suitable 1024 (due to the high number of OSDs), everything looked happy.
So, now I'm trying to play with an erasure-coded pool.
I did:
ceph osd erasure-code-profile set ec44profile k=4 m=4 ruleset-failure-domain=rack
ceph osd pool create ec44pool 8192 8192 erasure ec44profile

After settling for a bit 'ceph status' gives
    cluster 196e5eb8-d6a7-4435-907e-ea028e946923
     health HEALTH_WARN 7 pgs degraded; 7 pgs stuck degraded; 7 pgs stuck unclean; 7 pgs stuck undersized; 7 pgs undersized
     monmap e1: 4 mons at {hobbit01=10.5.38.1:6789/0,hobbit02=10.5.38.2:6789/0,hobbit13=10.5.38.13:6789/0,hobbit14=10.5.38.14:6789/0}, election epoch 6, quorum 0,1,2,3 hobbit01,hobbit02,hobbit13,hobbit14
     osdmap e409: 144 osds: 144 up, 144 in
      pgmap v6763: 12288 pgs, 2 pools, 0 bytes data, 0 objects
            90598 MB used, 640 TB / 640 TB avail
                   7 active+undersized+degraded
               12281 active+clean

So to troubleshoot the undersized pgs, I issued 'ceph pg dump_stuck'
ok
pg_stat	objects	mip	degr	misp	unf	bytes	log	disklog	state	state_stamp	v	reported	up	up_primary	acting	acting_primary	last_scrub	scrub_stamp	last_deep_scrub	deep_scrub_stamp
1.d77	0	0	0	0	0	0	0	0	active+undersized+degraded	2015-03-04 11:33:57.502849	0'0	408:12	[15,95,58,73,52,31,116,2147483647]	15	[15,95,58,73,52,31,116,2147483647]	15	0'0	2015-03-04 11:33:42.100752	0'0	2015-03-04 11:33:42.100752
1.10fa	0	0	0	0	0	0	0	0	active+undersized+degraded	2015-03-04 11:34:29.362554	0'0	408:12	[23,12,99,114,132,53,56,2147483647]	23	[23,12,99,114,132,53,56,2147483647]	23	0'0	2015-03-04 11:33:42.168571	0'0	2015-03-04 11:33:42.168571
1.1271	0	0	0	0	0	0	0	0	active+undersized+degraded	2015-03-04 11:33:48.795742	0'0	408:12	[135,112,69,4,22,95,2147483647,83]	135	[135,112,69,4,22,95,2147483647,83]	135	0'0	2015-03-04 11:33:42.139555	0'0	2015-03-04 11:33:42.139555
1.2b5	0	0	0	0	0	0	0	0	active+undersized+degraded	2015-03-04 11:34:32.189738	0'0	408:12	[11,115,139,19,76,52,94,2147483647]	11	[11,115,139,19,76,52,94,2147483647]	11	0'0	2015-03-04 11:33:42.079673	0'0	2015-03-04 11:33:42.079673
1.7ae	0	0	0	0	0	0	0	0	active+undersized+degraded	2015-03-04 11:34:26.848344	0'0	408:12	[27,5,132,119,94,56,52,2147483647]	27	[27,5,132,119,94,56,52,2147483647]	27	0'0	2015-03-04 11:33:42.109832	0'0	2015-03-04 11:33:42.109832
1.1a97	0	0	0	0	0	0	0	0	active+undersized+degraded	2015-03-04 11:34:25.457454	0'0	408:12	[20,53,14,54,102,118,2147483647,72]	20	[20,53,14,54,102,118,2147483647,72]	20	0'0	2015-03-04 11:33:42.833850	0'0	2015-03-04 11:33:42.833850
1.10a6	0	0	0	0	0	0	0	0	active+undersized+degraded	2015-03-04 11:34:30.059936	0'0	408:12	[136,22,4,2147483647,72,52,101,55]	136	[136,22,4,2147483647,72,52,101,55]	136	0'0	2015-03-04 11:33:42.125871	0'0	2015-03-04 11:33:42.125871

This appears to have a number on all these (2147483647) that is way out of line from what I would expect.

Thoughts?

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com