Hi,
we have problem with drastic performance slowing down on a cluster. We used radosgw with S3 protocol. Our configuration:
153 OSD SAS 1.2TB with journal on SSD disks (ratio 4:1)
- no problems with networking, no hardware issues, etc.
Output from "ceph df":
GLOBAL:
SIZE AVAIL RAW USED %RAW USED
166T 129T 38347G 22.44
POOLS:
NAME ID USED %USED MAX AVAIL OBJECTS
.rgw 9 70330k 0 39879G 393178
.rgw.root 10 848 0 39879G 3
.rgw.control 11 0 0 39879G 8
.rgw.gc 12 0 0 39879G 32
.rgw.buckets 13 10007G 5.86 39879G 331079052
.rgw.buckets.index 14 0 0 39879G 2994652
.rgw.buckets.extra 15 0 0 39879G 2
.log 16 475M 0 39879G 408
.intent-log 17 0 0 39879G 0
.users 19 729 0 39879G 49
.users.email 20 414 0 39879G 26
.users.swift 21 0 0 39879G 0
.users.uid 22 17170 0 39879G 89
Problems began on last saturday,
Troughput was 400k req per hour - mostly PUTs and HEADs ~100kb.
Ceph version is hammer.
We have two clusters with similar configuration and both experienced same problems at once.
Any hints
Latest output from "ceph -w":
2016-07-14 14:43:16.197131 osd.26 [WRN] 17 slow requests, 16 included below; oldest blocked for > 34.766976 secs
2016-07-14 14:43:16.197138 osd.26 [WRN] slow request 32.555599 seconds old, received at 2016-07-14 14:42:43.641440: osd_op(client.75866283.0:20130084 .dir.default.75866283.65796.3 [delete] 14.122252f4 ondisk+write+known_if_redirected e18788) currently commit_sent
2016-07-14 14:43:16.197145 osd.26 [WRN] slow request 32.536551 seconds old, received at 2016-07-14 14:42:43.660487: osd_op(client.75866283.0:20130121 .dir.default.75866283.65799.6 [delete] 14.d2dc1672 ondisk+write+known_if_redirected e18788) currently commit_sent
2016-07-14 14:43:16.197153 osd.26 [WRN] slow request 30.971549 seconds old, received at 2016-07-14 14:42:45.225490: osd_op(client.75866283.0:20132345 gc.12 [call rgw.gc_set_entry] 12.a45046b8 ack+ondisk+write+known_if_redirected e18788) currently waiting for rw locks
2016-07-14 14:43:16.197158 osd.26 [WRN] slow request 30.967568 seconds old, received at 2016-07-14 14:42:45.229471: osd_op(client.76495939.0:20147494 gc.12 [call rgw.gc_set_entry] 12.a45046b8 ack+ondisk+write+known_if_redirected e18788) currently waiting for rw locks
2016-07-14 14:43:16.197162 osd.26 [WRN] slow request 32.253169 seconds old, received at 2016-07-14 14:42:43.943870: osd_op(client.75866283.0:20130663 .dir.default.75866283.65805.7 [delete] 14.2b5a1672 ondisk+write+known_if_redirected e18788) currently commit_sent
2016-07-14 14:43:17.197429 osd.26 [WRN] 3 slow requests, 2 included below; oldest blocked for > 31.967882 secs
2016-07-14 14:43:17.197434 osd.26 [WRN] slow request 31.579897 seconds old, received at 2016-07-14 14:42:45.617456: osd_op(client.76495939.0:20147877 gc.12 [call rgw.gc_set_entry] 12.a45046b8 ack+ondisk+write+known_if_redirected e18788) currently waiting for rw locks
2016-07-14 14:43:17.197439 osd.26 [WRN] slow request 30.897873 seconds old, received at 2016-07-14 14:42:46.299480: osd_op(client.76495939.0:20148668 gc.12 [call rgw.gc_set_entry] 12.a45046b8 ack+ondisk+write+known_if_redirected e18788) currently waiting for rw locks
Regards
--
Jarosław Owsiewski
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com