Re: RGW Blocking on 1-2 PG's - argonaut

Sławomir Skowron <szibis@xxxxxxxxx> · Mon, 4 Mar 2013 18:42:47 +0100

Alone (one of this slow osd in mentioned tripple)

2013-03-04 18:39:27.683035 osd.23 [INF] bench: wrote 1024 MB in blocks
of 4096 KB in 15.241943 sec at 68795 KB/sec

in for loop (some slow request appear):

for x in `seq 0 25`; do ceph osd tell $x bench;done
2013-03-04 18:41:08.259454 osd.12 [INF] bench: wrote 1024 MB in blocks
of 4096 KB in 37.658448 sec at 27844 KB/sec
2013-03-04 18:41:07.850213 osd.5 [INF] bench: wrote 1024 MB in blocks
of 4096 KB in 37.402402 sec at 28034 KB/sec
2013-03-04 18:41:07.850231 osd.11 [INF] bench: wrote 1024 MB in blocks
of 4096 KB in 37.201831 sec at 28186 KB/sec
2013-03-04 18:41:08.100186 osd.10 [INF] bench: wrote 1024 MB in blocks
of 4096 KB in 37.540605 sec at 27931 KB/sec
2013-03-04 18:41:08.319766 osd.21 [INF] bench: wrote 1024 MB in blocks
of 4096 KB in 37.532806 sec at 27937 KB/sec
2013-03-04 18:41:08.415835 osd.14 [INF] bench: wrote 1024 MB in blocks
of 4096 KB in 37.772730 sec at 27760 KB/sec
2013-03-04 18:41:08.775264 osd.9 [INF] bench: wrote 1024 MB in blocks
of 4096 KB in 38.195523 sec at 27452 KB/sec
2013-03-04 18:41:08.808824 osd.6 [INF] bench: wrote 1024 MB in blocks
of 4096 KB in 38.338387 sec at 27350 KB/sec
2013-03-04 18:41:08.923809 osd.19 [INF] bench: wrote 1024 MB in blocks
of 4096 KB in 38.177933 sec at 27465 KB/sec
2013-03-04 18:41:08.925848 osd.18 [INF] bench: wrote 1024 MB in blocks
of 4096 KB in 38.201476 sec at 27448 KB/sec
2013-03-04 18:41:08.936961 osd.15 [INF] bench: wrote 1024 MB in blocks
of 4096 KB in 38.273058 sec at 27397 KB/sec
2013-03-04 18:41:08.619022 osd.20 [INF] bench: wrote 1024 MB in blocks
of 4096 KB in 37.713017 sec at 27804 KB/sec
2013-03-04 18:41:08.764705 osd.22 [INF] bench: wrote 1024 MB in blocks
of 4096 KB in 37.954886 sec at 27626 KB/sec
2013-03-04 18:41:08.499156 osd.0 [INF] bench: wrote 1024 MB in blocks
of 4096 KB in 38.035553 sec at 27568 KB/sec
2013-03-04 18:41:07.873457 osd.2 [INF] bench: wrote 1024 MB in blocks
of 4096 KB in 37.489969 sec at 27969 KB/sec
2013-03-04 18:41:08.134530 osd.13 [INF] bench: wrote 1024 MB in blocks
of 4096 KB in 37.513056 sec at 27952 KB/sec
2013-03-04 18:41:08.219142 osd.1 [INF] bench: wrote 1024 MB in blocks
of 4096 KB in 37.856368 sec at 27698 KB/sec
2013-03-04 18:41:08.485806 osd.4 [INF] bench: wrote 1024 MB in blocks
of 4096 KB in 38.060621 sec at 27550 KB/sec
2013-03-04 18:41:08.612236 osd.7 [INF] bench: wrote 1024 MB in blocks
of 4096 KB in 38.122105 sec at 27505 KB/sec
2013-03-04 18:41:08.647494 osd.8 [INF] bench: wrote 1024 MB in blocks
of 4096 KB in 38.134885 sec at 27496 KB/sec
2013-03-04 18:41:08.649267 osd.3 [INF] bench: wrote 1024 MB in blocks
of 4096 KB in 37.961966 sec at 27621 KB/sec
2013-03-04 18:41:08.943610 osd.24 [INF] bench: wrote 1024 MB in blocks
of 4096 KB in 38.091272 sec at 27527 KB/sec
2013-03-04 18:41:08.975838 osd.17 [INF] bench: wrote 1024 MB in blocks
of 4096 KB in 38.270884 sec at 27398 KB/sec
2013-03-04 18:41:09.544561 osd.23 [INF] bench: wrote 1024 MB in blocks
of 4096 KB in 38.715030 sec at 27084 KB/sec
2013-03-04 18:41:08.969981 osd.16 [INF] bench: wrote 1024 MB in blocks
of 4096 KB in 38.287596 sec at 27386 KB/sec
2013-03-04 18:41:09.533789 osd.25 [INF] bench: wrote 1024 MB in blocks
of 4096 KB in 37.954333 sec at 27627 KB/sec

I have a little fragmented xfs, but performance is still good.

On Mon, Mar 4, 2013 at 6:25 PM, Gregory Farnum <greg@xxxxxxxxxxx> wrote:
> On Mon, Mar 4, 2013 at 9:23 AM, Sławomir Skowron <szibis@xxxxxxxxx> wrote:
>> On Mon, Mar 4, 2013 at 6:02 PM, Sage Weil <sage@xxxxxxxxxxx> wrote:
>>> On Mon, 4 Mar 2013, S?awomir Skowron wrote:
>>>> Ok, thanks for response. But if i have crush map like this in attachment.
>>>>
>>>> All data should be balanced equal, not including hosts with 0.5 weight.
>>>>
>>>> How make data auto balanced ?? when i know that some pq's have too
>>>> much data ?? I have 4800 pg's on RGW only with 78 OSD, it is quite
>>>> enough.
>>>>
>>>> pool 3 '.rgw.buckets' rep size 3 crush_ruleset 0 object_hash rjenkins
>>>> pg_num 4800 pgp_num 4800 last_change 908 owner 0
>>>>
>>>> When will bee possible to expand number of pg's ??
>>>
>>> Soon.  :)
>>>
>>> The bigger question for me is why there is one PG that is getting pounded
>>> while the others are not.  Is there a large skew in the workload toward a
>>> small number of very hot objects?
>>
>> Yes, there are constantly about 100-200 operations in second, all
>> going into RGW backend. But when problems comes, there are more
>> requests, more GET, and PUT, because of reconnect of applications,
>> with short timeouts. But statistically all new PUTs normally goes for
>> many pg's, this should not overload a single master OSD. Maybe
>> balanced Reads from all replicas could help a little ??.
>>
>>>  I expect it should be obvious if you go
>>> to the loaded osd and do
>>>
>>>  ceph --admin-daemon /var/run/ceph/ceph-osd.NN.asok dump_ops_in_flight
>>>
>>
>> Yes i did that, but only when cluster going unstable there are such
>> long operations. Normaly there are no ops in queue, only when cluster
>> going to rebalance, remap, or anything else.
>
> Have you checked the baseline disk performance of the OSDs? Perhaps
> it's not that the PG is bad but that the OSDs are slow.

--
-----
Pozdrawiam

Sławek "sZiBis" Skowron
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html