Hmmm, I just struggled through this myself.
How many racks do you have? If not more than 8, you might want to make your failure domain smaller?
I.e., maybe host? That, at least, would allow you to debug the situation… -don- From: Kyle
Hutson [mailto:kylehutson@xxxxxxx] It wouldn't let me simply change the pg_num, giving Error EEXIST: specified pg_num 2048 <= current 8192 But that's not a big deal, I just deleted the pool and recreated with 'ceph osd pool create ec44pool 2048 2048 erasure ec44profile' ...and the result is quite similar: 'ceph status' is now ceph status cluster 196e5eb8-d6a7-4435-907e-ea028e946923 health HEALTH_WARN 4 pgs degraded; 4 pgs stuck unclean; 4 pgs undersized monmap e1: 4 mons at {hobbit01=10.5.38.1:6789/0,hobbit02=10.5.38.2:6789/0,hobbit13=10.5.38.13:6789/0,hobbit14=10.5.38.14:6789/0},
election epoch 6, quorum 0,1,2,3 hobbit01,hobbit02,hobbit13,hobbit14 osdmap e412: 144 osds: 144 up, 144 in pgmap v6798: 6144 pgs, 2 pools, 0 bytes data, 0 objects 90590 MB used, 640 TB / 640 TB avail 4 active+undersized+degraded 6140 active+clean 'ceph pg dump_stuck results' in ok pg_stat
objects mip
degr misp
unf bytes
log disklog
state state_stamp
v reported
up up_primary
acting acting_primary
last_scrub scrub_stamp
last_deep_scrub deep_scrub_stamp 2.296
0 0
0 0
0 0
0 0
active+undersized+degraded 2015-03-04 14:33:26.672224
0'0 412:9
[5,55,91,2147483647,83,135,53,26] 5
[5,55,91,2147483647,83,135,53,26] 5
0'0 2015-03-04 14:33:15.649911
0'0 2015-03-04 14:33:15.649911 2.69c
0 0
0 0
0 0
0 0
active+undersized+degraded 2015-03-04 14:33:24.984802
0'0 412:9
[93,134,1,74,112,28,2147483647,60] 93
[93,134,1,74,112,28,2147483647,60] 93
0'0 2015-03-04 14:33:15.695747
0'0 2015-03-04 14:33:15.695747 2.36d
0 0
0 0
0 0
0 0
active+undersized+degraded 2015-03-04 14:33:21.937620
0'0 412:9
[12,108,136,104,52,18,63,2147483647]
12 [12,108,136,104,52,18,63,2147483647]
12 0'0
2015-03-04 14:33:15.652480 0'0
2015-03-04 14:33:15.652480 2.5f7
0 0
0 0
0 0
0 0
active+undersized+degraded 2015-03-04 14:33:26.169242
0'0 412:9
[94,128,73,22,4,60,2147483647,113] 94
[94,128,73,22,4,60,2147483647,113] 94
0'0 2015-03-04 14:33:15.687695
0'0 2015-03-04 14:33:15.687695 I do have questions for you, even at this point, though. 1) Where did you find the formula (14400/(k+m))? 2) I was really trying to size this for when it goes to production, at which point it may have as many as 384 OSDs. Doesn't that imply I should have even more
pgs? On Wed, Mar 4, 2015 at 2:15 PM, Don Doerner <Don.Doerner@xxxxxxxxxxx> wrote: Oh duh… OK, then given a 4+4 erasure coding scheme, 14400/8 is 1800, so try 2048. -don-
From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx]
On Behalf Of Don Doerner In this case, that number means that there is not an OSD that can be assigned. What’s your k, m
from you erasure coded pool? You’ll need approximately (14400/(k+m)) PGs, rounded up to the next power of 2… -don-
From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx]
On Behalf Of Kyle Hutson Last night I blew away my previous ceph configuration (this environment is pre-production) and have 0.87.1 installed. I've manually edited the crushmap so it down looks like https://dpaste.de/OLEa I currently have 144 OSDs on 8 nodes. After increasing pg_num and pgp_num to a more suitable 1024 (due to the high number of OSDs), everything looked happy. So, now I'm trying to play with an erasure-coded pool. I did: ceph osd erasure-code-profile set ec44profile k=4 m=4 ruleset-failure-domain=rack ceph osd pool create ec44pool 8192 8192 erasure ec44profile After settling for a bit 'ceph status' gives cluster 196e5eb8-d6a7-4435-907e-ea028e946923 health HEALTH_WARN 7 pgs degraded; 7 pgs stuck degraded; 7 pgs stuck unclean; 7 pgs stuck undersized; 7 pgs undersized monmap e1: 4 mons at {hobbit01=10.5.38.1:6789/0,hobbit02=10.5.38.2:6789/0,hobbit13=10.5.38.13:6789/0,hobbit14=10.5.38.14:6789/0},
election epoch 6, quorum 0,1,2,3 hobbit01,hobbit02,hobbit13,hobbit14 osdmap e409: 144 osds: 144 up, 144 in pgmap v6763: 12288 pgs, 2 pools, 0 bytes data, 0 objects 90598 MB used, 640 TB / 640 TB avail 7 active+undersized+degraded 12281 active+clean So to troubleshoot the undersized pgs, I issued 'ceph pg dump_stuck' ok pg_stat objects mip degr misp unf bytes log disklog state state_stamp v reported up up_primary acting
acting_primary last_scrub scrub_stamp last_deep_scrub deep_scrub_stamp 1.d77 0 0 0 0 0 0 0 0 active+undersized+degraded 2015-03-04 11:33:57.502849 0'0 408:12 [15,95,58,73,52,31,116,2147483647]
15 [15,95,58,73,52,31,116,2147483647] 15 0'0 2015-03-04 11:33:42.100752 0'0 2015-03-04 11:33:42.100752 1.10fa 0 0 0 0 0 0 0 0 active+undersized+degraded 2015-03-04 11:34:29.362554 0'0 408:12 [23,12,99,114,132,53,56,2147483647]
23 [23,12,99,114,132,53,56,2147483647] 23 0'0 2015-03-04 11:33:42.168571 0'0 2015-03-04 11:33:42.168571 1.1271 0 0 0 0 0 0 0 0 active+undersized+degraded 2015-03-04 11:33:48.795742 0'0 408:12 [135,112,69,4,22,95,2147483647,83]
135 [135,112,69,4,22,95,2147483647,83] 135 0'0 2015-03-04 11:33:42.139555 0'0 2015-03-04 11:33:42.139555 1.2b5 0 0 0 0 0 0 0 0 active+undersized+degraded 2015-03-04 11:34:32.189738 0'0 408:12 [11,115,139,19,76,52,94,2147483647]
11 [11,115,139,19,76,52,94,2147483647] 11 0'0 2015-03-04 11:33:42.079673 0'0 2015-03-04 11:33:42.079673 1.7ae 0 0 0 0 0 0 0 0 active+undersized+degraded 2015-03-04 11:34:26.848344 0'0 408:12 [27,5,132,119,94,56,52,2147483647]
27 [27,5,132,119,94,56,52,2147483647] 27 0'0 2015-03-04 11:33:42.109832 0'0 2015-03-04 11:33:42.109832 1.1a97 0 0 0 0 0 0 0 0 active+undersized+degraded 2015-03-04 11:34:25.457454 0'0 408:12 [20,53,14,54,102,118,2147483647,72]
20 [20,53,14,54,102,118,2147483647,72] 20 0'0 2015-03-04 11:33:42.833850 0'0 2015-03-04 11:33:42.833850 1.10a6 0 0 0 0 0 0 0 0 active+undersized+degraded 2015-03-04 11:34:30.059936 0'0 408:12 [136,22,4,2147483647,72,52,101,55]
136 [136,22,4,2147483647,72,52,101,55] 136 0'0 2015-03-04 11:33:42.125871 0'0 2015-03-04 11:33:42.125871 This appears to have a number on all these
(2147483647) that is way out of line from what I would expect. Thoughts? The information contained in this transmission may be confidential. Any disclosure, copying, or further distribution of confidential information is not permitted unless such privilege
is explicitly granted in writing by Quantum. Quantum reserves the right to have electronic communications, including email and attachments, sent across its networks filtered through anti virus and spam software programs and retain such messages in order to
comply with applicable data security and retention requirements. Quantum is not responsible for the proper and complete transmission of the substance of this communication or for any delay in its receipt. |
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com