On Wed, Aug 5, 2015 at 1:36 PM, 乔建峰 <scaleqiao@xxxxxxxxx> wrote: > Add the mailing lists. > > 2015-08-05 13:34 GMT+08:00 乔建峰 <scaleqiao@xxxxxxxxx>: >> >> Hi Haomai, >> >> Thank you for the prompt response and the suggestion. >> >> I cannot agree with you more about using multiple pools in one flexible >> cluster. Per the scenario you described below, we can create more pools when >> expanding the cluster. But for the issue we are facing right now, creating >> new pool with proper pg_num/pgp_num might be only helpful for uniformly >> distributing the data of new images. It could not relief the imbalance >> within the existing data. Please correct me if I'm wrong. For the existing pool, you could adjust crush weight to get better data balance. >> >> Thanks, >> Jevon >> >> 2015-08-04 22:01 GMT+08:00 Haomai Wang <haomaiwang@xxxxxxxxx>: >>> >>> On Mon, Aug 3, 2015 at 4:05 PM, 乔建峰 <scaleqiao@xxxxxxxxx> wrote: >>> > [Including ceph-users alias] >>> > >>> > 2015-08-03 16:01 GMT+08:00 乔建峰 <scaleqiao@xxxxxxxxx>: >>> >> >>> >> Hi Cephers, >>> >> >>> >> Currently, I'm experiencing an issue which suffers me a lot, so I'm >>> >> writing to ask for your comments/help/suggestions. More details are >>> >> provided >>> >> bellow. >>> >> >>> >> Issue: >>> >> I set up a cluster having 24 OSDs and created one pool with 1024 >>> >> placement >>> >> groups on it for a small startup company. The number 1024 was >>> >> calculated per >>> >> the equation (OSDs * 100)/pool size. The cluster have been running >>> >> quite >>> >> well for a long time. But recently, our monitoring system always >>> >> complains >>> >> that some disks' usage exceed 85%. I log into the system and find out >>> >> that >>> >> some disks' usage are really very high, but some are not(less than >>> >> 60%). >>> >> Each time when the issue happens, I have to manually re-balance the >>> >> distribution. This is a short-term solution, I'm not willing to do it >>> >> all >>> >> the time. >>> >> >>> >> Two long-term solutions come in my mind, >>> >> 1) Ask the customers to expand their clusters by adding more OSDs. But >>> >> I >>> >> think they will ask me to explain the reason of the imbalance data >>> >> distribution. We've already done some analysis on the environment, we >>> >> learned that the most imbalance part in the CRUSH is the mapping >>> >> between >>> >> object and pg. The biggest pg has 613 objects, while the smallest pg >>> >> only >>> >> has 226 objects. >>> >> >>> >> 2) Increase the number of placement groups. It can be of great help >>> >> for >>> >> statistically uniform data distribution, but it can also incur >>> >> significant >>> >> data movement as PGs are effective being split. I just cannot do it in >>> >> our >>> >> customers' environment before we 100% understand the consequence. So >>> >> anyone >>> >> did this under a production environment? How much does this operation >>> >> affect >>> >> the performance of Clients? >>> >> >>> >> Any comments/help/suggestions will be highly appreciated. >>> >>> Of course not, pg split isn't a recommend process for running cluster. >>> It will block the client IO totally. Instead of recovering process >>> which will make object level control, split is a pg-level process and >>> osd itself can't control it smoothly. In theory if we need to make pg >>> split work at real cluster, we need to do more things at MON and lots >>> of logic will make trouble. Although we can't enjoy the flexible via >>> pg split, we can get the same result from *pool* with a little user >>> management logics. >>> >>> "pool" is good thing which can cover your need. Most users always like >>> to have one pool for the whole cluster, it's fine for immutable >>> cluster but not good for a flexible cluster I think. For example, if >>> double osd nodes, create a new pool is a better way than preparing a >>> pool with lots of pgs at a very beginning. If using openstack, >>> cloudstack or else, these cloud projects can provide with upper >>> control with "volume_type". >>> >>> In a word, we can enjoy increasing osds with a relatively small >>> account. But I think we can't feel free to double the ceph cluster and >>> hoping ceph could do it perfectly. >>> >>> >> >>> >> -- >>> >> Best Regards >>> >> Jevon >>> > >>> > >>> > >>> > >>> > -- >>> > Best Regards >>> > Jevon >>> > >>> > _______________________________________________ >>> > ceph-users mailing list >>> > ceph-users@xxxxxxxxxxxxxx >>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> > >>> >>> >>> >>> -- >>> Best Regards, >>> >>> Wheat >> >> >> >> >> -- >> Best Regards >> Jevon > > > > > -- > Best Regards > Jevon -- Best Regards, Wheat _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com