Hi Kyle, Thanks for you response. Though I haven't tested it, my gut feeling is the same, changing the PG number may result in re-shuffling of the data. In terms of the strategy you mentioned to expand a cluster, I have a few questions: 1. By adding a LITTLE more weight each time, my understanding is to reduce the load for the OSD being added, is it? If so, can we use the throttle setting to achieve the same goal? 2. If I would like to expand the cluster every quarter with 30% capacity, by using such way, it might take a long time to add new capacity, is my understanding correct? 3. Is there any automatic tool to do this, or I will need to closely monitor, and dump the crush rule / edit it and push back? I am testing a scenario to add one OSD each time (I have 330 OSD in total), the weight is using default one. There are a couple of observations: 1) the recovery start quick (several hundred MB/s) and then get slower to around 10MB/s. 2) It impact the online traffic quite a lot (from my observation, mainly of the recovering PGs). I tried to search some best practice to expand a cluster with bad luck, anybody would like to share your experience? Thanks very much. Thanks, Guang From: Kyle Bader <kyle.bader@xxxxxxxxx> To: "ceph-users@xxxxxxxxxxxxxx" <ceph-users@xxxxxxxxxxxxxx> Subject: Re: Expanding ceph cluster by adding more OSDs Message-ID: <CAFMfnwq+HBGsezMe3vwoM_gqCWiKd1393rxc+xB0xgT4nXqttg@xxxxxxxxxxxxxx> Content-Type: text/plain; charset="utf-8" I've contracted and expanded clusters by up to a rack of 216 OSDs - 18 nodes, 12 drives each. New disks are configured with a CRUSH weight of 0 and I slowly add weight (0.1 to 0.01 increments), wait for the cluster to become active+clean and then add more weight. I was expanding after contraction so my PG count didn't need to be corrected, I tend to be liberal and opt for more PGs. If I hadn't contracted the cluster prior to expanding it I would probably add PGs after all the new OSDs have finished being weighted into the cluster. On Wed, Oct 9, 2013 at 8:55 PM, Michael Lowe <j.michael.lowe@xxxxxxxxx>wrote: I had those same questions, I think the answer I got was that it was better to have too few pg's than to have overloaded osd's. So add osd's then add pg's. I don't know the best increments to grow in, probably depends largely on the hardware in your osd's. Sent from my iPad On Oct 9, 2013, at 11:34 PM, Guang <yguang11@xxxxxxxxx> wrote: Thanks Mike. I get your point. There are still a few things confusing me: 1) We expand Ceph cluster by adding more OSDs, which will trigger re-balance PGs across the old & new OSDs, and likely it will break the optimized PG numbers for the cluster. 2) We can add more PGs which will trigger re-balance objects across old & new PGs. So: 1) What is the recommended way to expand the cluster by adding OSDs (and potentially adding PGs), should we do them at the same time? 2) What is the recommended way to scale a cluster from like 1PB to 2PB, should we scale it to like 1.1PB to 1.2PB or move to 2PB directly? Thanks, Guang On Oct 10, 2013, at 11:10 AM, Michael Lowe wrote: There used to be, can't find it right now. Something like 'ceph osd set pg_num <num>' then 'ceph osd set pgp_num <num>' to actually move your data into the new pg's. I successfully did it several months ago, when bobtail was current. Sent from my iPad On Oct 9, 2013, at 10:30 PM, Guang <yguang11@xxxxxxxxx> wrote: Thanks Mike. Is there any documentation for that? Thanks, Guang On Oct 9, 2013, at 9:58 PM, Mike Lowe wrote: You can add PGs, the process is called splitting. I don't think PG merging, the reduction in the number of PGs, is ready yet. On Oct 8, 2013, at 11:58 PM, Guang <yguang11@xxxxxxxxx> wrote: Hi ceph-users, Ceph recommends the PGs number of a pool is (100 * OSDs) / Replicas, per my understanding, the number of PGs for a pool should be fixed even we scale out / in the cluster by adding / removing OSDs, does that mean if we double the OSD numbers, the PG number for a pool is not optimal any more and there is no chance to correct it? Thanks, Guang _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com |
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com