On Mon, 4 Mar 2013, S?awomir Skowron wrote: > Ok, thanks for response. But if i have crush map like this in attachment. > > All data should be balanced equal, not including hosts with 0.5 weight. > > How make data auto balanced ?? when i know that some pq's have too > much data ?? I have 4800 pg's on RGW only with 78 OSD, it is quite > enough. > > pool 3 '.rgw.buckets' rep size 3 crush_ruleset 0 object_hash rjenkins > pg_num 4800 pgp_num 4800 last_change 908 owner 0 > > When will bee possible to expand number of pg's ?? Soon. :) The bigger question for me is why there is one PG that is getting pounded while the others are not. Is there a large skew in the workload toward a small number of very hot objects? I expect it should be obvious if you go to the loaded osd and do ceph --admin-daemon /var/run/ceph/ceph-osd.NN.asok dump_ops_in_flight and look at the request queue. sage > > Best Regards > > Slawomir Skowron > > On Mon, Mar 4, 2013 at 3:16 PM, Yehuda Sadeh <yehuda@xxxxxxxxxxx> wrote: > > On Mon, Mar 4, 2013 at 3:02 AM, S?awomir Skowron <szibis@xxxxxxxxx> wrote: > >> Hi, > >> > >> We have a big problem with RGW. I don't know what is the initial > >> trigger, but i have theory. > >> > >> 2-3 osd, from 78 in cluster (6480 PG on RGW pool), have 3x time more > >> RAM usage, they have much more operations in journal, and much bigger > >> latency. > >> > >> When we PUT some objects then in some cases, there are so many > >> operations in triple replication on this osd (one PG). Then this > >> triple can't handle this load, and goes down, drives on backend of > >> this osd are getting fire with big wait-io, and big response times. > >> RGW waiting for this PG, and eventually block all the others > >> operations when makes 1024 operations blocked in queue. > >> Then whole cluster have problems, and we have an outage. > >> > >> When RGW block operations there is only one PG that have >1000 > >> operations in queue - > >> ceph pg map 3.9447554d > >> osdmap e11404 pg 3.9447554d (3.54d) -> up [53,45,23] acting [53,45,23] > >> > >> now this osd are migrated, with ratio 0.5 on, but before it was > >> > >> ceph pg map 3.9447554d > >> osdmap e11404 pg 3.9447554d (3.54d) -> up [71,45,23] acting [71,45,23] > >> > >> and this three osd's have such a problems. Under this osd's are only 3 > >> drive, one drive per osd, that's why this have such a big impact. > >> > >> What i done. I gave 50% smaller ratio in CRUSH for this osd's, but > >> data move to other osd, and this osd, have half of possible capacity. > >> I think it won't help in long term, and it's not a solution. > >> > >> I have second cluster, with only replication on it, and there are same > >> case. Attachment explain everything. Every parameter on this bad osd > >> is much higher than on others. There are 2-3 osd with such high > >> counters. > >> > >> Is this a bug ?? maybe there is no problems in bobtail ?? I can't > >> switch quick into bobtail that's why i need some answers, which way i > >> need to go. > >> > > > > Not sure if bobtail is going to help much here, although there were a > > few performance fixes that went in. If your cluster is unbalanced (in > > terms of performance) then requests are going to be accumulated on the > > weakest link. Reweighting the osd like what you did is a valid way to > > go. You need to make sure that on the steady state, there's no one osd > > that starts holding all the traffic. > > Also, make sure that your pools have enough pgs so that the placement > > distribution is uniform. > > > > Yehuda > > > > -- > ----- > Pozdrawiam > > S?awek "sZiBis" Skowron > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html