Re: RGW Blocking on 1-2 PG's - argonaut

Sławomir Skowron <szibis@xxxxxxxxx> · Mon, 4 Mar 2013 17:13:25 +0100

Ok, thanks for response. But if i have crush map like this in attachment.

All data should be balanced equal, not including hosts with 0.5 weight.

How make data auto balanced ?? when i know that some pq's have too
much data ?? I have 4800 pg's on RGW only with 78 OSD, it is quite
enough.

pool 3 '.rgw.buckets' rep size 3 crush_ruleset 0 object_hash rjenkins
pg_num 4800 pgp_num 4800 last_change 908 owner 0

When will bee possible to expand number of pg's ??

Best Regards

Slawomir Skowron

On Mon, Mar 4, 2013 at 3:16 PM, Yehuda Sadeh <yehuda@xxxxxxxxxxx> wrote:
> On Mon, Mar 4, 2013 at 3:02 AM, Sławomir Skowron <szibis@xxxxxxxxx> wrote:
>> Hi,
>>
>> We have a big problem with RGW. I don't know what is the initial
>> trigger, but i have theory.
>>
>> 2-3 osd, from 78 in cluster (6480 PG on RGW pool), have 3x time more
>> RAM usage, they have much more operations in journal, and much bigger
>> latency.
>>
>> When we PUT some objects then in some cases, there are so many
>> operations in triple replication on this osd (one PG). Then this
>> triple can't handle this load, and goes down, drives on backend of
>> this osd are getting fire with big wait-io, and big response times.
>> RGW waiting for this PG, and eventually block all the others
>> operations when makes 1024 operations blocked in queue.
>> Then whole cluster have problems, and we have an outage.
>>
>> When RGW block operations there is only one PG that have >1000
>> operations in queue -
>> ceph pg map 3.9447554d
>> osdmap e11404 pg 3.9447554d (3.54d) -> up [53,45,23] acting [53,45,23]
>>
>> now this osd are migrated, with ratio 0.5 on, but before it was
>>
>> ceph pg map 3.9447554d
>> osdmap e11404 pg 3.9447554d (3.54d) -> up [71,45,23] acting [71,45,23]
>>
>> and this three osd's have such a problems. Under this osd's are only 3
>> drive, one drive per osd, that's why this have such a big impact.
>>
>> What i done. I gave 50% smaller ratio in CRUSH for this osd's, but
>> data move to other osd, and this osd, have half of possible capacity.
>> I think it won't help in long term, and it's not a solution.
>>
>> I have second cluster, with only replication on it, and there are same
>> case. Attachment explain everything. Every parameter on this bad osd
>> is much higher than on others. There are 2-3 osd with such high
>> counters.
>>
>> Is this a bug ?? maybe there is no problems in bobtail ?? I can't
>> switch quick into bobtail that's why i need some answers, which way i
>> need to go.
>>
>
> Not sure if bobtail is going to help much here, although there were a
> few performance fixes that went in. If your cluster is unbalanced (in
> terms of performance) then requests are going to be accumulated on the
> weakest link. Reweighting the osd like what you did is a valid way to
> go. You need to make sure that on the steady state, there's no one osd
> that starts holding all the traffic.
> Also, make sure that your pools have enough pgs so that the placement
> distribution is uniform.
>
> Yehuda

--
-----
Pozdrawiam

Sławek "sZiBis" Skowron
# begin crush map

# devices
device 0 osd.0
device 1 osd.1
device 2 osd.2
device 3 osd.3
device 4 osd.4
device 5 osd.5
device 6 osd.6
device 7 osd.7
device 8 osd.8
device 9 osd.9
device 10 osd.10
device 11 osd.11
device 12 osd.12
device 13 osd.13
device 14 osd.14
device 15 osd.15
device 16 osd.16
device 17 osd.17
device 18 osd.18
device 19 osd.19
device 20 osd.20
device 21 osd.21
device 22 osd.22
device 23 osd.23
device 24 osd.24
device 25 osd.25
device 26 osd.26
device 27 osd.27
device 28 osd.28
device 29 osd.29
device 30 osd.30
device 31 osd.31
device 32 osd.32
device 33 osd.33
device 34 osd.34
device 35 osd.35
device 36 osd.36
device 37 osd.37
device 38 osd.38
device 39 osd.39
device 40 osd.40
device 41 osd.41
device 42 osd.42
device 43 osd.43
device 44 osd.44
device 45 osd.45
device 46 osd.46
device 47 osd.47
device 48 osd.48
device 49 osd.49
device 50 osd.50
device 51 osd.51
device 52 osd.52
device 53 osd.53
device 54 osd.54
device 55 osd.55
device 56 osd.56
device 57 osd.57
device 58 osd.58
device 59 osd.59
device 60 osd.60
device 61 osd.61
device 62 osd.62
device 63 osd.63
device 64 osd.64
device 65 osd.65
device 66 osd.66
device 67 osd.67
device 68 osd.68
device 69 osd.69
device 70 osd.70
device 71 osd.71
device 72 osd.72
device 73 osd.73
device 74 osd.74
device 75 osd.75
device 76 osd.76
device 77 osd.77

# types
type 0 osd
type 1 host
type 2 rack
type 3 row
type 4 room
type 5 datacenter
type 6 pool

# buckets
host s3-10-177-64-4 {
	id -2		# do not change unnecessarily
	# weight 25.500
	alg straw
	hash 0	# rjenkins1
	item osd.1 weight 1.000
	item osd.2 weight 1.000
	item osd.3 weight 1.000
	item osd.4 weight 1.000
	item osd.5 weight 1.000
	item osd.6 weight 1.000
	item osd.7 weight 1.000
	item osd.8 weight 1.000
	item osd.9 weight 1.000
	item osd.10 weight 1.000
	item osd.11 weight 1.000
	item osd.13 weight 1.000
	item osd.14 weight 1.000
	item osd.15 weight 1.000
	item osd.16 weight 1.000
	item osd.17 weight 1.000
	item osd.18 weight 1.000
	item osd.19 weight 1.000
	item osd.20 weight 1.000
	item osd.21 weight 1.000
	item osd.22 weight 1.000
	item osd.23 weight 0.500
	item osd.24 weight 1.000
	item osd.25 weight 1.000
	item osd.12 weight 1.000
	item osd.0 weight 1.000
}
rack rack1 {
	id -3		# do not change unnecessarily
	# weight 25.500
	alg straw
	hash 0	# rjenkins1
	item s3-10-177-64-4 weight 25.500
}
host s3-10-177-64-6 {
	id -4		# do not change unnecessarily
	# weight 25.500
	alg straw
	hash 0	# rjenkins1
	item osd.26 weight 1.000
	item osd.27 weight 1.000
	item osd.28 weight 1.000
	item osd.31 weight 1.000
	item osd.32 weight 1.000
	item osd.33 weight 1.000
	item osd.34 weight 1.000
	item osd.35 weight 1.000
	item osd.37 weight 1.000
	item osd.38 weight 1.000
	item osd.39 weight 1.000
	item osd.40 weight 1.000
	item osd.41 weight 1.000
	item osd.42 weight 1.000
	item osd.43 weight 1.000
	item osd.45 weight 0.500
	item osd.46 weight 1.000
	item osd.47 weight 1.000
	item osd.48 weight 1.000
	item osd.49 weight 1.000
	item osd.50 weight 1.000
	item osd.51 weight 1.000
	item osd.44 weight 1.000
	item osd.29 weight 1.000
	item osd.36 weight 1.000
	item osd.30 weight 1.000
}
rack rack2 {
	id -5		# do not change unnecessarily
	# weight 25.500
	alg straw
	hash 0	# rjenkins1
	item s3-10-177-64-6 weight 25.500
}
host s3-10-177-64-8 {
	id -6		# do not change unnecessarily
	# weight 25.500
	alg straw
	hash 0	# rjenkins1
	item osd.52 weight 1.000
	item osd.53 weight 1.000
	item osd.54 weight 1.000
	item osd.55 weight 1.000
	item osd.56 weight 1.000
	item osd.57 weight 1.000
	item osd.58 weight 1.000
	item osd.59 weight 1.000
	item osd.60 weight 1.000
	item osd.61 weight 1.000
	item osd.62 weight 1.000
	item osd.63 weight 1.000
	item osd.64 weight 1.000
	item osd.65 weight 1.000
	item osd.66 weight 1.000
	item osd.67 weight 1.000
	item osd.68 weight 1.000
	item osd.69 weight 1.000
	item osd.70 weight 1.000
	item osd.71 weight 0.500
	item osd.72 weight 1.000
	item osd.73 weight 1.000
	item osd.74 weight 1.000
	item osd.75 weight 1.000
	item osd.76 weight 1.000
	item osd.77 weight 1.000
}
rack rack3 {
	id -7		# do not change unnecessarily
	# weight 25.500
	alg straw
	hash 0	# rjenkins1
	item s3-10-177-64-8 weight 25.500
}
pool default {
	id -1		# do not change unnecessarily
	# weight 76.500
	alg straw
	hash 0	# rjenkins1
	item rack1 weight 25.500
	item rack2 weight 25.500
	item rack3 weight 25.500
}

# rules
rule data {
	ruleset 0
	type replicated
	min_size 1
	max_size 10
	step take default
	step chooseleaf firstn 0 type rack
	step emit
}
rule metadata {
	ruleset 1
	type replicated
	min_size 1
	max_size 10
	step take default
	step chooseleaf firstn 0 type rack
	step emit
}
rule rbd {
	ruleset 2
	type replicated
	min_size 1
	max_size 10
	step take default
	step chooseleaf firstn 0 type rack
	step emit
}

# end crush map