Re: New EC pool undersized

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I don’t know – I am playing with crush; someday I may fully comprehend it.  Not today.

 

I think you have to look at it like this: if your possible failure domain options are OSDs, hosts, racks, …, and you choose racks as your failure domain, and you have exactly as many racks as your pool size (and it can’t be any smaller, right?), then each PG has to have an OSD from each rack.  If your 144 OSDs are split evenly across 8 racks, then you have 18 OSDs in each rack (presumably distributed over the hosts in that rack, though I don’t think that distribution is important for this calculation).  And so your total number of choices is 18 to the 8th power, or just over 11 billion (actually, 11,019,960,576J).  So probably the only thing you have to worry about is “crush giving up too soon”, and Yann’s resolution.

 

-don-

 

From: Kyle Hutson [mailto:kylehutson@xxxxxxx]
Sent: 04 March, 2015 13:15
To: Don Doerner
Cc: ceph-users@xxxxxxxxxxxxxx
Subject: Re: [ceph-users] New EC pool undersized

 

So it sounds like I should figure out at 'how many nodes' do I need to increase pg_num to 4096, and again for 8192, and increase those incrementally when as I add more hosts, correct?

 

On Wed, Mar 4, 2015 at 3:04 PM, Don Doerner <Don.Doerner@xxxxxxxxxxx> wrote:

Sorry, I missed your other questions, down at the bottom.  See here (look for “number of replicas for replicated pools or the K+M sum for erasure coded pools”) for the formula; 38400/8 probably implies 8192.

 

The thing is, you’ve got to think about how many ways you can form combinations of 8 unique OSDs (with replacement) that match your failure domain rules.  If you’ve only got 8 hosts, and your failure domain is hosts, it severely limits this number.  And I have read that too many isn’t good either – a serialization issue, I believe.

 

-don-

 

From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of Don Doerner
Sent: 04 March, 2015 12:49
To: Kyle Hutson
Cc: ceph-users@xxxxxxxxxxxxxx


Subject: Re: [ceph-users] New EC pool undersized

 

Hmmm, I just struggled through this myself.  How many racks do you have?  If not more than 8, you might want to make your failure domain smaller?  I.e., maybe host?  That, at least, would allow you to debug the situation…

 

-don-

 

From: Kyle Hutson [mailto:kylehutson@xxxxxxx]
Sent: 04 March, 2015 12:43
To: Don Doerner
Cc: Ceph Users
Subject: Re: [ceph-users] New EC pool undersized

 

It wouldn't let me simply change the pg_num, giving

Error EEXIST: specified pg_num 2048 <= current 8192

 

But that's not a big deal, I just deleted the pool and recreated with 'ceph osd pool create ec44pool 2048 2048 erasure ec44profile'

...and the result is quite similar: 'ceph status' is now

ceph status

    cluster 196e5eb8-d6a7-4435-907e-ea028e946923

     health HEALTH_WARN 4 pgs degraded; 4 pgs stuck unclean; 4 pgs undersized

     monmap e1: 4 mons at {hobbit01=10.5.38.1:6789/0,hobbit02=10.5.38.2:6789/0,hobbit13=10.5.38.13:6789/0,hobbit14=10.5.38.14:6789/0}, election epoch 6, quorum 0,1,2,3 hobbit01,hobbit02,hobbit13,hobbit14

     osdmap e412: 144 osds: 144 up, 144 in

      pgmap v6798: 6144 pgs, 2 pools, 0 bytes data, 0 objects

            90590 MB used, 640 TB / 640 TB avail

                   4 active+undersized+degraded

                6140 active+clean

 

'ceph pg dump_stuck results' in

ok

pg_stat   objects   mip  degr misp unf  bytes     log  disklog     state     state_stamp    v    reported  up   up_primary     acting    acting_primary last_scrub     scrub_stamp     last_deep_scrub     deep_scrub_stamp

2.296     0    0    0    0    0    0    0    0     active+undersized+degraded    2015-03-04 14:33:26.672224     0'0  412:9     [5,55,91,2147483647,83,135,53,26]  5     [5,55,91,2147483647,83,135,53,26]  5    0'0  2015-03-04 14:33:15.649911     0'0  2015-03-04 14:33:15.649911

2.69c     0    0    0    0    0    0    0    0     active+undersized+degraded    2015-03-04 14:33:24.984802     0'0  412:9     [93,134,1,74,112,28,2147483647,60] 93     [93,134,1,74,112,28,2147483647,60] 93   0'0  2015-03-04 14:33:15.695747     0'0  2015-03-04 14:33:15.695747

2.36d     0    0    0    0    0    0    0    0     active+undersized+degraded    2015-03-04 14:33:21.937620     0'0  412:9     [12,108,136,104,52,18,63,2147483647]    12   [12,108,136,104,52,18,63,2147483647]    12   0'0  2015-03-04 14:33:15.652480    0'0  2015-03-04 14:33:15.652480

2.5f7     0    0    0    0    0    0    0    0     active+undersized+degraded    2015-03-04 14:33:26.169242     0'0  412:9     [94,128,73,22,4,60,2147483647,113] 94     [94,128,73,22,4,60,2147483647,113] 94   0'0  2015-03-04 14:33:15.687695     0'0  2015-03-04 14:33:15.687695

 

I do have questions for you, even at this point, though.

1) Where did you find the formula (14400/(k+m))?

2) I was really trying to size this for when it goes to production, at which point it may have as many as 384 OSDs. Doesn't that imply I should have even more pgs?

 

On Wed, Mar 4, 2015 at 2:15 PM, Don Doerner <Don.Doerner@xxxxxxxxxxx> wrote:

Oh duh…  OK, then given a 4+4 erasure coding scheme, 14400/8 is 1800, so try 2048.

 

-don-

 

From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of Don Doerner
Sent: 04 March, 2015 12:14
To: Kyle Hutson; Ceph Users
Subject: Re: [ceph-users] New EC pool undersized

 

In this case, that number means that there is not an OSD that can be assigned.  What’s your k, m from you erasure coded pool?  You’ll need approximately (14400/(k+m)) PGs, rounded up to the next power of 2…

 

-don-

 

From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of Kyle Hutson
Sent: 04 March, 2015 12:06
To: Ceph Users
Subject: [ceph-users] New EC pool undersized

 

Last night I blew away my previous ceph configuration (this environment is pre-production) and have 0.87.1 installed. I've manually edited the crushmap so it down looks like https://dpaste.de/OLEa

 

I currently have 144 OSDs on 8 nodes.

 

After increasing pg_num and pgp_num to a more suitable 1024 (due to the high number of OSDs), everything looked happy.

So, now I'm trying to play with an erasure-coded pool.

I did:

ceph osd erasure-code-profile set ec44profile k=4 m=4 ruleset-failure-domain=rack

ceph osd pool create ec44pool 8192 8192 erasure ec44profile

 

After settling for a bit 'ceph status' gives

    cluster 196e5eb8-d6a7-4435-907e-ea028e946923

     health HEALTH_WARN 7 pgs degraded; 7 pgs stuck degraded; 7 pgs stuck unclean; 7 pgs stuck undersized; 7 pgs undersized

     monmap e1: 4 mons at {hobbit01=10.5.38.1:6789/0,hobbit02=10.5.38.2:6789/0,hobbit13=10.5.38.13:6789/0,hobbit14=10.5.38.14:6789/0}, election epoch 6, quorum 0,1,2,3 hobbit01,hobbit02,hobbit13,hobbit14

     osdmap e409: 144 osds: 144 up, 144 in

      pgmap v6763: 12288 pgs, 2 pools, 0 bytes data, 0 objects

            90598 MB used, 640 TB / 640 TB avail

                   7 active+undersized+degraded

               12281 active+clean

 

So to troubleshoot the undersized pgs, I issued 'ceph pg dump_stuck'

ok

pg_stat   objects   mip  degr misp unf  bytes     log  disklog     state     state_stamp    v    reported  up   up_primary     acting    acting_primary last_scrub     scrub_stamp     last_deep_scrub     deep_scrub_stamp

1.d77     0    0    0    0    0    0    0    0     active+undersized+degraded    2015-03-04 11:33:57.502849     0'0  408:12    [15,95,58,73,52,31,116,2147483647] 15     [15,95,58,73,52,31,116,2147483647] 15   0'0  2015-03-04 11:33:42.100752     0'0  2015-03-04 11:33:42.100752

1.10fa    0    0    0    0    0    0    0    0     active+undersized+degraded    2015-03-04 11:34:29.362554     0'0  408:12    [23,12,99,114,132,53,56,2147483647]     23   [23,12,99,114,132,53,56,2147483647]     23   0'0  2015-03-04 11:33:42.168571    0'0  2015-03-04 11:33:42.168571

1.1271    0    0    0    0    0    0    0    0     active+undersized+degraded    2015-03-04 11:33:48.795742     0'0  408:12    [135,112,69,4,22,95,2147483647,83] 135     [135,112,69,4,22,95,2147483647,83] 135  0'0  2015-03-04 11:33:42.139555     0'0  2015-03-04 11:33:42.139555

1.2b5     0    0    0    0    0    0    0    0     active+undersized+degraded    2015-03-04 11:34:32.189738     0'0  408:12    [11,115,139,19,76,52,94,2147483647]     11   [11,115,139,19,76,52,94,2147483647]     11   0'0  2015-03-04 11:33:42.079673    0'0  2015-03-04 11:33:42.079673

1.7ae     0    0    0    0    0    0    0    0     active+undersized+degraded    2015-03-04 11:34:26.848344     0'0  408:12    [27,5,132,119,94,56,52,2147483647] 27     [27,5,132,119,94,56,52,2147483647] 27   0'0  2015-03-04 11:33:42.109832     0'0  2015-03-04 11:33:42.109832

1.1a97    0    0    0    0    0    0    0    0     active+undersized+degraded    2015-03-04 11:34:25.457454     0'0  408:12    [20,53,14,54,102,118,2147483647,72]     20   [20,53,14,54,102,118,2147483647,72]     20   0'0  2015-03-04 11:33:42.833850    0'0  2015-03-04 11:33:42.833850

1.10a6    0    0    0    0    0    0    0    0     active+undersized+degraded    2015-03-04 11:34:30.059936     0'0  408:12    [136,22,4,2147483647,72,52,101,55] 136     [136,22,4,2147483647,72,52,101,55] 136  0'0  2015-03-04 11:33:42.125871     0'0  2015-03-04 11:33:42.125871

 

This appears to have a number on all these (2147483647) that is way out of line from what I would expect.

 

Thoughts?

 


The information contained in this transmission may be confidential. Any disclosure, copying, or further distribution of confidential information is not permitted unless such privilege is explicitly granted in writing by Quantum. Quantum reserves the right to have electronic communications, including email and attachments, sent across its networks filtered through anti virus and spam software programs and retain such messages in order to comply with applicable data security and retention requirements. Quantum is not responsible for the proper and complete transmission of the substance of this communication or for any delay in its receipt.

 

 

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux