Alone (one of this slow osd in mentioned tripple) 2013-03-04 18:39:27.683035 osd.23 [INF] bench: wrote 1024 MB in blocks of 4096 KB in 15.241943 sec at 68795 KB/sec in for loop (some slow request appear): for x in `seq 0 25`; do ceph osd tell $x bench;done 2013-03-04 18:41:08.259454 osd.12 [INF] bench: wrote 1024 MB in blocks of 4096 KB in 37.658448 sec at 27844 KB/sec 2013-03-04 18:41:07.850213 osd.5 [INF] bench: wrote 1024 MB in blocks of 4096 KB in 37.402402 sec at 28034 KB/sec 2013-03-04 18:41:07.850231 osd.11 [INF] bench: wrote 1024 MB in blocks of 4096 KB in 37.201831 sec at 28186 KB/sec 2013-03-04 18:41:08.100186 osd.10 [INF] bench: wrote 1024 MB in blocks of 4096 KB in 37.540605 sec at 27931 KB/sec 2013-03-04 18:41:08.319766 osd.21 [INF] bench: wrote 1024 MB in blocks of 4096 KB in 37.532806 sec at 27937 KB/sec 2013-03-04 18:41:08.415835 osd.14 [INF] bench: wrote 1024 MB in blocks of 4096 KB in 37.772730 sec at 27760 KB/sec 2013-03-04 18:41:08.775264 osd.9 [INF] bench: wrote 1024 MB in blocks of 4096 KB in 38.195523 sec at 27452 KB/sec 2013-03-04 18:41:08.808824 osd.6 [INF] bench: wrote 1024 MB in blocks of 4096 KB in 38.338387 sec at 27350 KB/sec 2013-03-04 18:41:08.923809 osd.19 [INF] bench: wrote 1024 MB in blocks of 4096 KB in 38.177933 sec at 27465 KB/sec 2013-03-04 18:41:08.925848 osd.18 [INF] bench: wrote 1024 MB in blocks of 4096 KB in 38.201476 sec at 27448 KB/sec 2013-03-04 18:41:08.936961 osd.15 [INF] bench: wrote 1024 MB in blocks of 4096 KB in 38.273058 sec at 27397 KB/sec 2013-03-04 18:41:08.619022 osd.20 [INF] bench: wrote 1024 MB in blocks of 4096 KB in 37.713017 sec at 27804 KB/sec 2013-03-04 18:41:08.764705 osd.22 [INF] bench: wrote 1024 MB in blocks of 4096 KB in 37.954886 sec at 27626 KB/sec 2013-03-04 18:41:08.499156 osd.0 [INF] bench: wrote 1024 MB in blocks of 4096 KB in 38.035553 sec at 27568 KB/sec 2013-03-04 18:41:07.873457 osd.2 [INF] bench: wrote 1024 MB in blocks of 4096 KB in 37.489969 sec at 27969 KB/sec 2013-03-04 18:41:08.134530 osd.13 [INF] bench: wrote 1024 MB in blocks of 4096 KB in 37.513056 sec at 27952 KB/sec 2013-03-04 18:41:08.219142 osd.1 [INF] bench: wrote 1024 MB in blocks of 4096 KB in 37.856368 sec at 27698 KB/sec 2013-03-04 18:41:08.485806 osd.4 [INF] bench: wrote 1024 MB in blocks of 4096 KB in 38.060621 sec at 27550 KB/sec 2013-03-04 18:41:08.612236 osd.7 [INF] bench: wrote 1024 MB in blocks of 4096 KB in 38.122105 sec at 27505 KB/sec 2013-03-04 18:41:08.647494 osd.8 [INF] bench: wrote 1024 MB in blocks of 4096 KB in 38.134885 sec at 27496 KB/sec 2013-03-04 18:41:08.649267 osd.3 [INF] bench: wrote 1024 MB in blocks of 4096 KB in 37.961966 sec at 27621 KB/sec 2013-03-04 18:41:08.943610 osd.24 [INF] bench: wrote 1024 MB in blocks of 4096 KB in 38.091272 sec at 27527 KB/sec 2013-03-04 18:41:08.975838 osd.17 [INF] bench: wrote 1024 MB in blocks of 4096 KB in 38.270884 sec at 27398 KB/sec 2013-03-04 18:41:09.544561 osd.23 [INF] bench: wrote 1024 MB in blocks of 4096 KB in 38.715030 sec at 27084 KB/sec 2013-03-04 18:41:08.969981 osd.16 [INF] bench: wrote 1024 MB in blocks of 4096 KB in 38.287596 sec at 27386 KB/sec 2013-03-04 18:41:09.533789 osd.25 [INF] bench: wrote 1024 MB in blocks of 4096 KB in 37.954333 sec at 27627 KB/sec I have a little fragmented xfs, but performance is still good. On Mon, Mar 4, 2013 at 6:25 PM, Gregory Farnum <greg@xxxxxxxxxxx> wrote: > On Mon, Mar 4, 2013 at 9:23 AM, Sławomir Skowron <szibis@xxxxxxxxx> wrote: >> On Mon, Mar 4, 2013 at 6:02 PM, Sage Weil <sage@xxxxxxxxxxx> wrote: >>> On Mon, 4 Mar 2013, S?awomir Skowron wrote: >>>> Ok, thanks for response. But if i have crush map like this in attachment. >>>> >>>> All data should be balanced equal, not including hosts with 0.5 weight. >>>> >>>> How make data auto balanced ?? when i know that some pq's have too >>>> much data ?? I have 4800 pg's on RGW only with 78 OSD, it is quite >>>> enough. >>>> >>>> pool 3 '.rgw.buckets' rep size 3 crush_ruleset 0 object_hash rjenkins >>>> pg_num 4800 pgp_num 4800 last_change 908 owner 0 >>>> >>>> When will bee possible to expand number of pg's ?? >>> >>> Soon. :) >>> >>> The bigger question for me is why there is one PG that is getting pounded >>> while the others are not. Is there a large skew in the workload toward a >>> small number of very hot objects? >> >> Yes, there are constantly about 100-200 operations in second, all >> going into RGW backend. But when problems comes, there are more >> requests, more GET, and PUT, because of reconnect of applications, >> with short timeouts. But statistically all new PUTs normally goes for >> many pg's, this should not overload a single master OSD. Maybe >> balanced Reads from all replicas could help a little ??. >> >>> I expect it should be obvious if you go >>> to the loaded osd and do >>> >>> ceph --admin-daemon /var/run/ceph/ceph-osd.NN.asok dump_ops_in_flight >>> >> >> Yes i did that, but only when cluster going unstable there are such >> long operations. Normaly there are no ops in queue, only when cluster >> going to rebalance, remap, or anything else. > > Have you checked the baseline disk performance of the OSDs? Perhaps > it's not that the PG is bad but that the OSDs are slow. -- ----- Pozdrawiam Sławek "sZiBis" Skowron -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html