Re: Deadly slow Ceph cluster revisited

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I would say use admin socket to find out which part is causing most of the latencies, don't rule out disk anomalies.

Thanks & Regards
Somnath


-----Original Message-----
From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of J David
Sent: Friday, July 17, 2015 8:07 AM
To: Quentin Hartman
Cc: ceph-users@xxxxxxxxxxxxxx
Subject: Re:  Deadly slow Ceph cluster revisited

On Fri, Jul 17, 2015 at 10:47 AM, Quentin Hartman <qhartman@xxxxxxxxxxxxxxxxxxx> wrote:
> What does "ceph status" say?

Usually it says everything is cool.  However just now it gave this:

    cluster e9c32e63-f3eb-4c25-b172-4815ed566ec7
     health HEALTH_WARN 2 requests are blocked > 32 sec
     monmap e3: 3 mons at
{f16=192.168.19.216:6789/0,f17=192.168.19.217:6789/0,f18=192.168.19.218:6789/0},
election epoch 176, quorum 0,1,2 f16,f17,f18
     osdmap e10705: 28 osds: 28 up, 28 in
      pgmap v26213984: 4224 pgs, 3 pools, 21637 GB data, 5409 kobjects
            43344 GB used, 86870 GB / 127 TB avail
                4224 active+clean
  client io 368 kB/s rd, 444 kB/s wr, 10 op/s

How do I find out what requests are blocked and why?  Resolving whatever that is seems like a very good next step to troubleshooting this issue.

> I had a problem with similar symptoms some months ago that was
> accompanied by OSDs getting marked out for no apparent reason and the
> cluster going into a HEALTH_WARN state intermittently.

There's no record of any OSD's being marked out at the time of problems.  (E.g. as shown above, it's currently angry about something, but all OSD's are up & in.)

Here's the slow requests related info from ceph.log from the last 90 minutes:

2015-07-17 06:25:14.809606 osd.16 192.168.19.218:6823/29360 4137 :
[WRN] 1 slow requests, 1 included below; oldest blocked for >
30.051626 secs
2015-07-17 06:25:14.809618 osd.16 192.168.19.218:6823/29360 4138 :
[WRN] slow request 30.051626 seconds old, received at 2015-07-17
06:24:44.757909: osd_op(client.32913524.0:3016955 rbd_data.15322ae8944a.000000000011f34f [set-alloc-hint object_size
4194304 write_size 4194304,write 1041920~3152384] 2.20c2afbe
ack+ondisk+write e10705) v4 currently waiting for subops from 13
2015-07-17 06:25:20.280232 osd.24 192.168.19.218:6826/31177 4095 :
[WRN] 1 slow requests, 1 included below; oldest blocked for >
30.105786 secs
2015-07-17 06:25:20.280239 osd.24 192.168.19.218:6826/31177 4096 :
[WRN] slow request 30.105786 seconds old, received at 2015-07-17
06:24:50.174399: osd_op(client.32913524.0:3016970
rbd_data.15322ae8944a.000000000011f355 [set-alloc-hint object_size
4194304 write_size 4194304,write 0~4194304] 2.66b14219
ack+ondisk+write e10705) v4 currently waiting for subops from 4
2015-07-17 06:25:28.476583 osd.2 192.168.19.216:6808/29552 12827 :
[WRN] 1 slow requests, 1 included below; oldest blocked for >
33.389677 secs
2015-07-17 06:25:28.476594 osd.2 192.168.19.216:6808/29552 12828 :
[WRN] slow request 33.389677 seconds old, received at 2015-07-17
06:24:55.086843: osd_sub_op(client.32913524.0:3016982 2.e55
53500e55/rbd_data.15322ae8944a.000000000011f359/head//2 [] v
10705'851608 snapset=0=[]:[] snapc=0=[]) v11 currently commit sent
2015-07-17 06:25:25.223395 osd.20 192.168.19.218:6810/30518 4133 :
[WRN] 1 slow requests, 1 included below; oldest blocked for >
30.255906 secs
2015-07-17 06:25:25.223406 osd.20 192.168.19.218:6810/30518 4134 :
[WRN] slow request 30.255906 seconds old, received at 2015-07-17
06:24:54.967416: osd_op(client.32913524.0:3016982
rbd_data.15322ae8944a.000000000011f359 [set-alloc-hint object_size
4194304 write_size 4194304,write 0~3683840] 2.53500e55
ack+ondisk+write e10705) v4 currently waiting for subops from 2
2015-07-17 06:25:30.224304 osd.20 192.168.19.218:6810/30518 4135 :
[WRN] 1 slow requests, 1 included below; oldest blocked for >
30.574590 secs
2015-07-17 06:25:30.224312 osd.20 192.168.19.218:6810/30518 4136 :
[WRN] slow request 30.574590 seconds old, received at 2015-07-17
06:24:59.649652: osd_op(client.32913524.0:3017153
rbd_data.15322ae8944a.000000000011f359 [set-alloc-hint object_size
4194304 write_size 4194304,write 3683840~510464] 2.53500e55
ack+ondisk+write e10705) v4 currently waiting for subops from 2
2015-07-17 06:25:34.624203 osd.23 192.168.19.218:6801/26435 4154 :
[WRN] 1 slow requests, 1 included below; oldest blocked for >
30.596777 secs
2015-07-17 06:25:34.624215 osd.23 192.168.19.218:6801/26435 4155 :
[WRN] slow request 30.596777 seconds old, received at 2015-07-17
06:25:04.027362: osd_op(client.32913524.0:3017169 rbd_data.15322ae8944a.000000000011f35e [set-alloc-hint object_size
4194304 write_size 4194304,write 2963456~1230848] 2.dad14b4
ack+ondisk+write e10705) v4 currently waiting for subops from 8
2015-07-17 06:26:00.932314 osd.15 192.168.19.217:6800/25977 3491 :
[WRN] 1 slow requests, 1 included below; oldest blocked for >
30.549624 secs
2015-07-17 06:26:00.932328 osd.15 192.168.19.217:6800/25977 3492 :
[WRN] slow request 30.549624 seconds old, received at 2015-07-17
06:25:30.382612: osd_op(client.32913524.0:3017343
rbd_data.15322ae8944a.000000000011f381 [set-alloc-hint object_size
4194304 write_size 4194304,write 0~758784] 2.85df8e7d ack+ondisk+write
e10705) v4 currently waiting for subops from 16
2015-07-17 06:26:00.817039 osd.16 192.168.19.218:6823/29360 4139 :
[WRN] 1 slow requests, 1 included below; oldest blocked for >
30.433143 secs
2015-07-17 06:26:00.817050 osd.16 192.168.19.218:6823/29360 4140 :
[WRN] slow request 30.433143 seconds old, received at 2015-07-17
06:25:30.383828: osd_op(client.32913524.0:3017342
rbd_data.15322ae8944a.000000000011f380 [set-alloc-hint object_size
4194304 write_size 4194304,write 0~4194304] 2.221d2914
ack+ondisk+write e10705) v4 currently waiting for subops from 15
2015-07-17 06:26:03.932886 osd.15 192.168.19.217:6800/25977 3493 :
[WRN] 2 slow requests, 1 included below; oldest blocked for >
33.550207 secs
2015-07-17 06:26:03.932897 osd.15 192.168.19.217:6800/25977 3494 :
[WRN] slow request 30.646373 seconds old, received at 2015-07-17
06:25:33.286446: osd_op(client.32913524.0:3017346
rbd_data.15322ae8944a.000000000011f381 [set-alloc-hint object_size
4194304 write_size 4194304,write 758784~1604096] 2.85df8e7d
ack+ondisk+write e10705) v4 currently waiting for subops from 16
2015-07-17 06:26:40.635147 osd.23 192.168.19.218:6801/26435 4156 :
[WRN] 1 slow requests, 1 included below; oldest blocked for >
30.795528 secs
2015-07-17 06:26:40.635159 osd.23 192.168.19.218:6801/26435 4157 :
[WRN] slow request 30.795528 seconds old, received at 2015-07-17
06:26:09.839555: osd_op(client.32913524.0:3017581
rbd_data.15322ae8944a.000000000011f392 [set-alloc-hint object_size
4194304 write_size 4194304,write 0~4194304] 2.aa972bd9
ack+ondisk+write e10705) v4 currently waiting for subops from 2
2015-07-17 06:26:41.635500 osd.23 192.168.19.218:6801/26435 4158 :
[WRN] 2 slow requests, 1 included below; oldest blocked for >
31.795852 secs
2015-07-17 06:26:41.635514 osd.23 192.168.19.218:6801/26435 4159 :
[WRN] slow request 30.079795 seconds old, received at 2015-07-17
06:26:11.555612: osd_op(client.32913524.0:3017585
rbd_data.15322ae8944a.000000000011f394 [set-alloc-hint object_size
4194304 write_size 4194304,write 626176~817664] 2.a6e61a96
ack+ondisk+write e10705) v4 currently waiting for subops from 2
2015-07-17 06:26:39.176889 osd.17 192.168.19.218:6802/28653 3969 :
[WRN] 1 slow requests, 1 included below; oldest blocked for >
30.692314 secs
2015-07-17 06:26:39.176902 osd.17 192.168.19.218:6802/28653 3970 :
[WRN] slow request 30.692314 seconds old, received at 2015-07-17
06:26:08.484502: osd_op(client.32913524.0:3017578
rbd_data.15322ae8944a.000000000011f390 [set-alloc-hint object_size
4194304 write_size 4194304,write 0~4194304] 2.bc6f5a48
ack+ondisk+write e10705) v4 currently waiting for subops from 0
2015-07-17 06:26:40.714858 osd.0 192.168.19.216:6802/30465 12598 :
[WRN] 1 slow requests, 1 included below; oldest blocked for >
32.151835 secs
2015-07-17 06:26:40.714870 osd.0 192.168.19.216:6802/30465 12599 :
[WRN] slow request 32.151835 seconds old, received at 2015-07-17
06:26:08.562963: osd_sub_op(client.32913524.0:3017578 2.a48
bc6f5a48/rbd_data.15322ae8944a.000000000011f390/head//2 [] v
10705'1062490 snapset=0=[]:[] snapc=0=[]) v11 currently commit sent
2015-07-17 06:26:45.636230 osd.23 192.168.19.218:6801/26435 4160 :
[WRN] 3 slow requests, 1 included below; oldest blocked for >
35.796603 secs
2015-07-17 06:26:45.636241 osd.23 192.168.19.218:6801/26435 4161 :
[WRN] slow request 30.160384 seconds old, received at 2015-07-17
06:26:15.475774: osd_op(client.32913524.0:3017588
rbd_data.15322ae8944a.000000000011f394 [set-alloc-hint object_size
4194304 write_size 4194304,write 1443840~2750464] 2.a6e61a96
ack+ondisk+write e10705) v4 currently waiting for subops from 2
2015-07-17 07:01:19.574865 osd.20 192.168.19.218:6810/30518 4137 :
[WRN] 1 slow requests, 1 included below; oldest blocked for >
30.757438 secs
2015-07-17 07:01:19.574875 osd.20 192.168.19.218:6810/30518 4138 :
[WRN] slow request 30.757438 seconds old, received at 2015-07-17
07:00:48.817365: osd_op(client.32913524.0:3043827
rbd_data.15322ae8944a.000000000011f612 [set-alloc-hint object_size
4194304 write_size 4194304,write 0~4194304] 2.43ddced8
ack+ondisk+write e10705) v4 currently waiting for subops from 13
2015-07-17 07:19:05.032481 osd.27 192.168.19.218:6835/616 4195 : [WRN]
1 slow requests, 1 included below; oldest blocked for > 30.468540 secs
2015-07-17 07:19:05.032493 osd.27 192.168.19.218:6835/616 4196 : [WRN] slow request 30.468540 seconds old, received at 2015-07-17
07:18:34.563882: osd_op(client.32913524.0:3067506
rbd_data.15322ae8944a.000000000011fac3 [set-alloc-hint object_size
4194304 write_size 4194304,write 1571840~2622464] 2.8f6c67f9
ack+ondisk+write e10705) v4 currently waiting for subops from 10
2015-07-17 07:29:34.581318 osd.25 192.168.19.218:6808/27310 4209 :
[WRN] 1 slow requests, 1 included below; oldest blocked for >
30.722213 secs
2015-07-17 07:29:34.581328 osd.25 192.168.19.218:6808/27310 4210 :
[WRN] slow request 30.722213 seconds old, received at 2015-07-17
07:29:03.859038: osd_op(client.32913524.0:3079735
rbd_data.15322ae8944a.000000000011fc05 [set-alloc-hint object_size
4194304 write_size 4194304,write 562688~3631616] 2.77c7a7ed
ack+ondisk+write e10705) v4 currently waiting for subops from 2
2015-07-17 07:37:19.848982 osd.19 192.168.19.218:6817/30107 4173 :
[WRN] 1 slow requests, 1 included below; oldest blocked for >
30.341166 secs
2015-07-17 07:37:19.848995 osd.19 192.168.19.218:6817/30107 4174 :
[WRN] slow request 30.341166 seconds old, received at 2015-07-17
07:36:49.507750: osd_op(client.32913524.0:3086134 rbd_data.15322ae8944a.0000000000117a8f [set-alloc-hint object_size
4194304 write_size 4194304,write 2400256~1794048] 2.f63d2256
ack+ondisk+write e10705) v4 currently waiting for subops from 11
2015-07-17 07:39:16.795902 osd.22 192.168.19.218:6803/31969 4213 :
[WRN] 1 slow requests, 1 included below; oldest blocked for >
30.806946 secs
2015-07-17 07:39:16.795913 osd.22 192.168.19.218:6803/31969 4214 :
[WRN] slow request 30.806946 seconds old, received at 2015-07-17
07:38:45.988886: osd_op(client.33043239.0:24
benchmark_data_v10_773558_object23 [write 0~4194304] 2.df0a06b1
ack+ondisk+write e10705) v4 currently waiting for subops from 8
2015-07-17 07:39:16.952224 osd.20 192.168.19.218:6810/30518 4140 :
[WRN] 1 slow requests, 1 included below; oldest blocked for >
30.730329 secs
2015-07-17 07:39:16.952239 osd.20 192.168.19.218:6810/30518 4141 :
[WRN] slow request 30.730329 seconds old, received at 2015-07-17
07:38:46.221830: osd_op(client.33043239.0:32
benchmark_data_v10_773558_object31 [write 0~4194304] 2.5cb39991
ack+ondisk+write e10705) v4 currently waiting for subops from 7
2015-07-17 07:39:34.473372 osd.18 192.168.19.218:6811/27959 4134 :
[WRN] 1 slow requests, 1 included below; oldest blocked for >
30.948511 secs
2015-07-17 07:39:34.473378 osd.18 192.168.19.218:6811/27959 4135 :
[WRN] slow request 30.948511 seconds old, received at 2015-07-17
07:39:03.524801: osd_op(client.32913524.0:3087431
rbd_data.15322ae8944a.000000000011fd52 [set-alloc-hint object_size
4194304 write_size 4194304,write 0~4194304] 2.97cb1221
ack+ondisk+write e10705) v4 currently waiting for subops from 8
2015-07-17 07:42:09.982125 osd.20 192.168.19.218:6810/30518 4142 :
[WRN] 1 slow requests, 1 included below; oldest blocked for >
30.816780 secs
2015-07-17 07:42:09.982135 osd.20 192.168.19.218:6810/30518 4143 :
[WRN] slow request 30.816780 seconds old, received at 2015-07-17
07:41:39.165270: osd_op(client.32913524.0:3090871 rbd_data.15322ae8944a.000000000011fd8d [set-alloc-hint object_size
4194304 write_size 4194304,write 1296384~2897920] 2.c7b8f895
ack+ondisk+write e10705) v4 currently waiting for subops from 3
2015-07-17 07:43:28.839202 osd.22 192.168.19.218:6803/31969 4215 :
[WRN] 1 slow requests, 1 included below; oldest blocked for >
30.910041 secs
2015-07-17 07:43:28.839212 osd.22 192.168.19.218:6803/31969 4216 :
[WRN] slow request 30.910041 seconds old, received at 2015-07-17
07:42:57.929091: osd_op(client.32913524.0:3092331
rbd_data.15322ae8944a.000000000011fdb9 [set-alloc-hint object_size
4194304 write_size 4194304,write 0~4194304] 2.d330159a
ack+ondisk+write e10705) v4 currently waiting for subops from 12
2015-07-17 07:45:32.505307 osd.27 192.168.19.218:6835/616 4198 : [WRN]
1 slow requests, 1 included below; oldest blocked for > 30.539282 secs
2015-07-17 07:45:32.505314 osd.27 192.168.19.218:6835/616 4199 : [WRN] slow request 30.539282 seconds old, received at 2015-07-17
07:45:01.965959: osd_op(client.33043608.0:11
benchmark_data_v10_776494_object10 [write 0~4194304] 2.1c8acd02
ack+ondisk+write e10705) v4 currently waiting for subops from 8
2015-07-17 07:45:31.951468 osd.23 192.168.19.218:6801/26435 4163 :
[WRN] 1 slow requests, 1 included below; oldest blocked for >
30.096108 secs
2015-07-17 07:45:31.951479 osd.23 192.168.19.218:6801/26435 4164 :
[WRN] slow request 30.096108 seconds old, received at 2015-07-17
07:45:01.855291: osd_op(client.33043608.0:12
benchmark_data_v10_776494_object11 [write 0~4194304] 2.c0939f2d
ack+ondisk+write e10705) v4 currently waiting for subops from 15
2015-07-17 07:47:44.528714 osd.27 192.168.19.218:6835/616 4200 : [WRN]
1 slow requests, 1 included below; oldest blocked for > 30.597559 secs
2015-07-17 07:47:44.528725 osd.27 192.168.19.218:6835/616 4201 : [WRN] slow request 30.597559 seconds old, received at 2015-07-17
07:47:13.931089: osd_op(client.32913524.0:3095452
rbd_data.15322ae8944a.000000000011fdf5 [set-alloc-hint object_size
4194304 write_size 4194304,write 112640~4081664] 2.ee56fce9
ack+ondisk+write e10705) v4 currently waiting for subops from 2
2015-07-17 07:56:54.132850 osd.16 192.168.19.218:6823/29360 4143 :
[WRN] 1 slow requests, 1 included below; oldest blocked for >
30.609297 secs
2015-07-17 07:56:54.132860 osd.16 192.168.19.218:6823/29360 4144 :
[WRN] slow request 30.609297 seconds old, received at 2015-07-17
07:56:23.523489: osd_op(client.32913524.0:3103115 rbd_data.15322ae8944a.000000000011fecc [set-alloc-hint object_size
4194304 write_size 4194304,write 0~4194304] 2.b89c4ef9
ack+ondisk+write e10705) v4 currently waiting for subops from 4
2015-07-17 07:57:46.978549 osd.26 192.168.19.218:6814/32600 4186 :
[WRN] 1 slow requests, 1 included below; oldest blocked for >
30.392645 secs
2015-07-17 07:57:46.978560 osd.26 192.168.19.218:6814/32600 4187 :
[WRN] slow request 30.392645 seconds old, received at 2015-07-17
07:57:16.585839: osd_op(client.32913524.0:3103561 rbd_data.15322ae8944a.000000000011ff2a [set-alloc-hint object_size
4194304 write_size 4194304,write 804352~3389952] 2.d785c7be
ack+ondisk+write e10705) v4 currently waiting for subops from 15
2015-07-17 07:58:32.881812 osd.25 192.168.19.218:6808/27310 4212 :
[WRN] 1 slow requests, 1 included below; oldest blocked for >
30.288058 secs
2015-07-17 07:58:32.881822 osd.25 192.168.19.218:6808/27310 4213 :
[WRN] slow request 30.288058 seconds old, received at 2015-07-17
07:58:02.593690: osd_op(client.32913524.0:3103784
rbd_data.15322ae8944a.000000000011ff49 [set-alloc-hint object_size
4194304 write_size 4194304,write 0~4194304] 2.15910138
ack+ondisk+write e10705) v4 currently waiting for subops from 6
2015-07-17 07:58:33.588315 osd.6 192.168.19.217:6820/6346 4064 : [WRN]
1 slow requests, 1 included below; oldest blocked for > 30.878035 secs
2015-07-17 07:58:33.588326 osd.6 192.168.19.217:6820/6346 4065 : [WRN] slow request 30.878035 seconds old, received at 2015-07-17
07:58:02.710227: osd_sub_op(client.32913524.0:3103784 2.138
15910138/rbd_data.15322ae8944a.000000000011ff49/head//2 [] v
10705'3708350 snapset=0=[]:[] snapc=0=[]) v11 currently commit sent
2015-07-17 07:59:26.151882 osd.20 192.168.19.218:6810/30518 4144 :
[WRN] 1 slow requests, 1 included below; oldest blocked for >
30.282649 secs
2015-07-17 07:59:26.151892 osd.20 192.168.19.218:6810/30518 4145 :
[WRN] slow request 30.282649 seconds old, received at 2015-07-17
07:58:55.869167: osd_op(client.32913524.0:3104686 rbd_data.15322ae8944a.000000000011ff5a [set-alloc-hint object_size
4194304 write_size 4194304,write 0~4194304] 2.f3ee1f19
ack+ondisk+write e10705) v4 currently waiting for subops from 6
2015-07-17 07:59:33.153127 osd.20 192.168.19.218:6810/30518 4146 :
[WRN] 2 slow requests, 1 included below; oldest blocked for >
37.283897 secs
2015-07-17 07:59:33.153138 osd.20 192.168.19.218:6810/30518 4147 :
[WRN] slow request 30.197115 seconds old, received at 2015-07-17
07:59:02.955949: osd_op(client.32913524.0:3104740
rbd_data.15322ae8944a.000000000011ff63 [set-alloc-hint object_size
4194304 write_size 4194304,write 0~4194304] 2.5af954ae
ack+ondisk+write e10705) v4 currently waiting for subops from 6
2015-07-17 08:01:22.013001 osd.26 192.168.19.218:6814/32600 4188 :
[WRN] 1 slow requests, 1 included below; oldest blocked for >
30.301797 secs
2015-07-17 08:01:22.013013 osd.26 192.168.19.218:6814/32600 4189 :
[WRN] slow request 30.301797 seconds old, received at 2015-07-17
08:00:51.711130: osd_op(client.32913524.0:3105804
rbd_data.15322ae8944a.000000000011ffc5 [set-alloc-hint object_size
4194304 write_size 4194304,write 0~4194304] 2.ce78fdf7
ack+ondisk+write e10705) v4 currently waiting for subops from 6

Does this shed any light on things?

Thanks!
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

________________________________

PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux