Last night I blew away my previous ceph configuration (this environment is pre-production) and have 0.87.1 installed. I've manually edited the crushmap so it down looks like https://dpaste.de/OLEa
I currently have 144 OSDs on 8 nodes.
After increasing pg_num and pgp_num to a more suitable 1024 (due to the high number of OSDs), everything looked happy.
So, now I'm trying to play with an erasure-coded pool.
I did:
ceph osd erasure-code-profile set ec44profile k=4 m=4 ruleset-failure-domain=rack
ceph osd pool create ec44pool 8192 8192 erasure ec44profile
After settling for a bit 'ceph status' gives
cluster 196e5eb8-d6a7-4435-907e-ea028e946923
health HEALTH_WARN 7 pgs degraded; 7 pgs stuck degraded; 7 pgs stuck unclean; 7 pgs stuck undersized; 7 pgs undersized
monmap e1: 4 mons at {hobbit01=10.5.38.1:6789/0,hobbit02=10.5.38.2:6789/0,hobbit13=10.5.38.13:6789/0,hobbit14=10.5.38.14:6789/0}, election epoch 6, quorum 0,1,2,3 hobbit01,hobbit02,hobbit13,hobbit14
osdmap e409: 144 osds: 144 up, 144 in
pgmap v6763: 12288 pgs, 2 pools, 0 bytes data, 0 objects
90598 MB used, 640 TB / 640 TB avail
7 active+undersized+degraded
12281 active+clean
So to troubleshoot the undersized pgs, I issued 'ceph pg dump_stuck'
ok
pg_stat objects mip degr misp unf bytes log disklog state state_stamp v reported up up_primary acting acting_primary last_scrub scrub_stamp last_deep_scrub deep_scrub_stamp
1.d77 0 0 0 0 0 0 0 0 active+undersized+degraded 2015-03-04 11:33:57.502849 0'0 408:12 [15,95,58,73,52,31,116,2147483647] 15 [15,95,58,73,52,31,116,2147483647] 15 0'0 2015-03-04 11:33:42.100752 0'0 2015-03-04 11:33:42.100752
1.10fa 0 0 0 0 0 0 0 0 active+undersized+degraded 2015-03-04 11:34:29.362554 0'0 408:12 [23,12,99,114,132,53,56,2147483647] 23 [23,12,99,114,132,53,56,2147483647] 23 0'0 2015-03-04 11:33:42.168571 0'0 2015-03-04 11:33:42.168571
1.1271 0 0 0 0 0 0 0 0 active+undersized+degraded 2015-03-04 11:33:48.795742 0'0 408:12 [135,112,69,4,22,95,2147483647,83] 135 [135,112,69,4,22,95,2147483647,83] 135 0'0 2015-03-04 11:33:42.139555 0'0 2015-03-04 11:33:42.139555
1.2b5 0 0 0 0 0 0 0 0 active+undersized+degraded 2015-03-04 11:34:32.189738 0'0 408:12 [11,115,139,19,76,52,94,2147483647] 11 [11,115,139,19,76,52,94,2147483647] 11 0'0 2015-03-04 11:33:42.079673 0'0 2015-03-04 11:33:42.079673
1.7ae 0 0 0 0 0 0 0 0 active+undersized+degraded 2015-03-04 11:34:26.848344 0'0 408:12 [27,5,132,119,94,56,52,2147483647] 27 [27,5,132,119,94,56,52,2147483647] 27 0'0 2015-03-04 11:33:42.109832 0'0 2015-03-04 11:33:42.109832
1.1a97 0 0 0 0 0 0 0 0 active+undersized+degraded 2015-03-04 11:34:25.457454 0'0 408:12 [20,53,14,54,102,118,2147483647,72] 20 [20,53,14,54,102,118,2147483647,72] 20 0'0 2015-03-04 11:33:42.833850 0'0 2015-03-04 11:33:42.833850
1.10a6 0 0 0 0 0 0 0 0 active+undersized+degraded 2015-03-04 11:34:30.059936 0'0 408:12 [136,22,4,2147483647,72,52,101,55] 136 [136,22,4,2147483647,72,52,101,55] 136 0'0 2015-03-04 11:33:42.125871 0'0 2015-03-04 11:33:42.125871
This appears to have a number on all these (2147483647) that is way out of line from what I would expect.
Thoughts?
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com