On Fri, Jul 17, 2015 at 10:47 AM, Quentin Hartman <qhartman@xxxxxxxxxxxxxxxxxxx> wrote: > What does "ceph status" say? Usually it says everything is cool. However just now it gave this: cluster e9c32e63-f3eb-4c25-b172-4815ed566ec7 health HEALTH_WARN 2 requests are blocked > 32 sec monmap e3: 3 mons at {f16=192.168.19.216:6789/0,f17=192.168.19.217:6789/0,f18=192.168.19.218:6789/0}, election epoch 176, quorum 0,1,2 f16,f17,f18 osdmap e10705: 28 osds: 28 up, 28 in pgmap v26213984: 4224 pgs, 3 pools, 21637 GB data, 5409 kobjects 43344 GB used, 86870 GB / 127 TB avail 4224 active+clean client io 368 kB/s rd, 444 kB/s wr, 10 op/s How do I find out what requests are blocked and why? Resolving whatever that is seems like a very good next step to troubleshooting this issue. > I had a problem with similar symptoms some > months ago that was accompanied by OSDs getting marked out for no apparent > reason and the cluster going into a HEALTH_WARN state intermittently. There's no record of any OSD's being marked out at the time of problems. (E.g. as shown above, it's currently angry about something, but all OSD's are up & in.) Here's the slow requests related info from ceph.log from the last 90 minutes: 2015-07-17 06:25:14.809606 osd.16 192.168.19.218:6823/29360 4137 : [WRN] 1 slow requests, 1 included below; oldest blocked for > 30.051626 secs 2015-07-17 06:25:14.809618 osd.16 192.168.19.218:6823/29360 4138 : [WRN] slow request 30.051626 seconds old, received at 2015-07-17 06:24:44.757909: osd_op(client.32913524.0:3016955 rbd_data.15322ae8944a.000000000011f34f [set-alloc-hint object_size 4194304 write_size 4194304,write 1041920~3152384] 2.20c2afbe ack+ondisk+write e10705) v4 currently waiting for subops from 13 2015-07-17 06:25:20.280232 osd.24 192.168.19.218:6826/31177 4095 : [WRN] 1 slow requests, 1 included below; oldest blocked for > 30.105786 secs 2015-07-17 06:25:20.280239 osd.24 192.168.19.218:6826/31177 4096 : [WRN] slow request 30.105786 seconds old, received at 2015-07-17 06:24:50.174399: osd_op(client.32913524.0:3016970 rbd_data.15322ae8944a.000000000011f355 [set-alloc-hint object_size 4194304 write_size 4194304,write 0~4194304] 2.66b14219 ack+ondisk+write e10705) v4 currently waiting for subops from 4 2015-07-17 06:25:28.476583 osd.2 192.168.19.216:6808/29552 12827 : [WRN] 1 slow requests, 1 included below; oldest blocked for > 33.389677 secs 2015-07-17 06:25:28.476594 osd.2 192.168.19.216:6808/29552 12828 : [WRN] slow request 33.389677 seconds old, received at 2015-07-17 06:24:55.086843: osd_sub_op(client.32913524.0:3016982 2.e55 53500e55/rbd_data.15322ae8944a.000000000011f359/head//2 [] v 10705'851608 snapset=0=[]:[] snapc=0=[]) v11 currently commit sent 2015-07-17 06:25:25.223395 osd.20 192.168.19.218:6810/30518 4133 : [WRN] 1 slow requests, 1 included below; oldest blocked for > 30.255906 secs 2015-07-17 06:25:25.223406 osd.20 192.168.19.218:6810/30518 4134 : [WRN] slow request 30.255906 seconds old, received at 2015-07-17 06:24:54.967416: osd_op(client.32913524.0:3016982 rbd_data.15322ae8944a.000000000011f359 [set-alloc-hint object_size 4194304 write_size 4194304,write 0~3683840] 2.53500e55 ack+ondisk+write e10705) v4 currently waiting for subops from 2 2015-07-17 06:25:30.224304 osd.20 192.168.19.218:6810/30518 4135 : [WRN] 1 slow requests, 1 included below; oldest blocked for > 30.574590 secs 2015-07-17 06:25:30.224312 osd.20 192.168.19.218:6810/30518 4136 : [WRN] slow request 30.574590 seconds old, received at 2015-07-17 06:24:59.649652: osd_op(client.32913524.0:3017153 rbd_data.15322ae8944a.000000000011f359 [set-alloc-hint object_size 4194304 write_size 4194304,write 3683840~510464] 2.53500e55 ack+ondisk+write e10705) v4 currently waiting for subops from 2 2015-07-17 06:25:34.624203 osd.23 192.168.19.218:6801/26435 4154 : [WRN] 1 slow requests, 1 included below; oldest blocked for > 30.596777 secs 2015-07-17 06:25:34.624215 osd.23 192.168.19.218:6801/26435 4155 : [WRN] slow request 30.596777 seconds old, received at 2015-07-17 06:25:04.027362: osd_op(client.32913524.0:3017169 rbd_data.15322ae8944a.000000000011f35e [set-alloc-hint object_size 4194304 write_size 4194304,write 2963456~1230848] 2.dad14b4 ack+ondisk+write e10705) v4 currently waiting for subops from 8 2015-07-17 06:26:00.932314 osd.15 192.168.19.217:6800/25977 3491 : [WRN] 1 slow requests, 1 included below; oldest blocked for > 30.549624 secs 2015-07-17 06:26:00.932328 osd.15 192.168.19.217:6800/25977 3492 : [WRN] slow request 30.549624 seconds old, received at 2015-07-17 06:25:30.382612: osd_op(client.32913524.0:3017343 rbd_data.15322ae8944a.000000000011f381 [set-alloc-hint object_size 4194304 write_size 4194304,write 0~758784] 2.85df8e7d ack+ondisk+write e10705) v4 currently waiting for subops from 16 2015-07-17 06:26:00.817039 osd.16 192.168.19.218:6823/29360 4139 : [WRN] 1 slow requests, 1 included below; oldest blocked for > 30.433143 secs 2015-07-17 06:26:00.817050 osd.16 192.168.19.218:6823/29360 4140 : [WRN] slow request 30.433143 seconds old, received at 2015-07-17 06:25:30.383828: osd_op(client.32913524.0:3017342 rbd_data.15322ae8944a.000000000011f380 [set-alloc-hint object_size 4194304 write_size 4194304,write 0~4194304] 2.221d2914 ack+ondisk+write e10705) v4 currently waiting for subops from 15 2015-07-17 06:26:03.932886 osd.15 192.168.19.217:6800/25977 3493 : [WRN] 2 slow requests, 1 included below; oldest blocked for > 33.550207 secs 2015-07-17 06:26:03.932897 osd.15 192.168.19.217:6800/25977 3494 : [WRN] slow request 30.646373 seconds old, received at 2015-07-17 06:25:33.286446: osd_op(client.32913524.0:3017346 rbd_data.15322ae8944a.000000000011f381 [set-alloc-hint object_size 4194304 write_size 4194304,write 758784~1604096] 2.85df8e7d ack+ondisk+write e10705) v4 currently waiting for subops from 16 2015-07-17 06:26:40.635147 osd.23 192.168.19.218:6801/26435 4156 : [WRN] 1 slow requests, 1 included below; oldest blocked for > 30.795528 secs 2015-07-17 06:26:40.635159 osd.23 192.168.19.218:6801/26435 4157 : [WRN] slow request 30.795528 seconds old, received at 2015-07-17 06:26:09.839555: osd_op(client.32913524.0:3017581 rbd_data.15322ae8944a.000000000011f392 [set-alloc-hint object_size 4194304 write_size 4194304,write 0~4194304] 2.aa972bd9 ack+ondisk+write e10705) v4 currently waiting for subops from 2 2015-07-17 06:26:41.635500 osd.23 192.168.19.218:6801/26435 4158 : [WRN] 2 slow requests, 1 included below; oldest blocked for > 31.795852 secs 2015-07-17 06:26:41.635514 osd.23 192.168.19.218:6801/26435 4159 : [WRN] slow request 30.079795 seconds old, received at 2015-07-17 06:26:11.555612: osd_op(client.32913524.0:3017585 rbd_data.15322ae8944a.000000000011f394 [set-alloc-hint object_size 4194304 write_size 4194304,write 626176~817664] 2.a6e61a96 ack+ondisk+write e10705) v4 currently waiting for subops from 2 2015-07-17 06:26:39.176889 osd.17 192.168.19.218:6802/28653 3969 : [WRN] 1 slow requests, 1 included below; oldest blocked for > 30.692314 secs 2015-07-17 06:26:39.176902 osd.17 192.168.19.218:6802/28653 3970 : [WRN] slow request 30.692314 seconds old, received at 2015-07-17 06:26:08.484502: osd_op(client.32913524.0:3017578 rbd_data.15322ae8944a.000000000011f390 [set-alloc-hint object_size 4194304 write_size 4194304,write 0~4194304] 2.bc6f5a48 ack+ondisk+write e10705) v4 currently waiting for subops from 0 2015-07-17 06:26:40.714858 osd.0 192.168.19.216:6802/30465 12598 : [WRN] 1 slow requests, 1 included below; oldest blocked for > 32.151835 secs 2015-07-17 06:26:40.714870 osd.0 192.168.19.216:6802/30465 12599 : [WRN] slow request 32.151835 seconds old, received at 2015-07-17 06:26:08.562963: osd_sub_op(client.32913524.0:3017578 2.a48 bc6f5a48/rbd_data.15322ae8944a.000000000011f390/head//2 [] v 10705'1062490 snapset=0=[]:[] snapc=0=[]) v11 currently commit sent 2015-07-17 06:26:45.636230 osd.23 192.168.19.218:6801/26435 4160 : [WRN] 3 slow requests, 1 included below; oldest blocked for > 35.796603 secs 2015-07-17 06:26:45.636241 osd.23 192.168.19.218:6801/26435 4161 : [WRN] slow request 30.160384 seconds old, received at 2015-07-17 06:26:15.475774: osd_op(client.32913524.0:3017588 rbd_data.15322ae8944a.000000000011f394 [set-alloc-hint object_size 4194304 write_size 4194304,write 1443840~2750464] 2.a6e61a96 ack+ondisk+write e10705) v4 currently waiting for subops from 2 2015-07-17 07:01:19.574865 osd.20 192.168.19.218:6810/30518 4137 : [WRN] 1 slow requests, 1 included below; oldest blocked for > 30.757438 secs 2015-07-17 07:01:19.574875 osd.20 192.168.19.218:6810/30518 4138 : [WRN] slow request 30.757438 seconds old, received at 2015-07-17 07:00:48.817365: osd_op(client.32913524.0:3043827 rbd_data.15322ae8944a.000000000011f612 [set-alloc-hint object_size 4194304 write_size 4194304,write 0~4194304] 2.43ddced8 ack+ondisk+write e10705) v4 currently waiting for subops from 13 2015-07-17 07:19:05.032481 osd.27 192.168.19.218:6835/616 4195 : [WRN] 1 slow requests, 1 included below; oldest blocked for > 30.468540 secs 2015-07-17 07:19:05.032493 osd.27 192.168.19.218:6835/616 4196 : [WRN] slow request 30.468540 seconds old, received at 2015-07-17 07:18:34.563882: osd_op(client.32913524.0:3067506 rbd_data.15322ae8944a.000000000011fac3 [set-alloc-hint object_size 4194304 write_size 4194304,write 1571840~2622464] 2.8f6c67f9 ack+ondisk+write e10705) v4 currently waiting for subops from 10 2015-07-17 07:29:34.581318 osd.25 192.168.19.218:6808/27310 4209 : [WRN] 1 slow requests, 1 included below; oldest blocked for > 30.722213 secs 2015-07-17 07:29:34.581328 osd.25 192.168.19.218:6808/27310 4210 : [WRN] slow request 30.722213 seconds old, received at 2015-07-17 07:29:03.859038: osd_op(client.32913524.0:3079735 rbd_data.15322ae8944a.000000000011fc05 [set-alloc-hint object_size 4194304 write_size 4194304,write 562688~3631616] 2.77c7a7ed ack+ondisk+write e10705) v4 currently waiting for subops from 2 2015-07-17 07:37:19.848982 osd.19 192.168.19.218:6817/30107 4173 : [WRN] 1 slow requests, 1 included below; oldest blocked for > 30.341166 secs 2015-07-17 07:37:19.848995 osd.19 192.168.19.218:6817/30107 4174 : [WRN] slow request 30.341166 seconds old, received at 2015-07-17 07:36:49.507750: osd_op(client.32913524.0:3086134 rbd_data.15322ae8944a.0000000000117a8f [set-alloc-hint object_size 4194304 write_size 4194304,write 2400256~1794048] 2.f63d2256 ack+ondisk+write e10705) v4 currently waiting for subops from 11 2015-07-17 07:39:16.795902 osd.22 192.168.19.218:6803/31969 4213 : [WRN] 1 slow requests, 1 included below; oldest blocked for > 30.806946 secs 2015-07-17 07:39:16.795913 osd.22 192.168.19.218:6803/31969 4214 : [WRN] slow request 30.806946 seconds old, received at 2015-07-17 07:38:45.988886: osd_op(client.33043239.0:24 benchmark_data_v10_773558_object23 [write 0~4194304] 2.df0a06b1 ack+ondisk+write e10705) v4 currently waiting for subops from 8 2015-07-17 07:39:16.952224 osd.20 192.168.19.218:6810/30518 4140 : [WRN] 1 slow requests, 1 included below; oldest blocked for > 30.730329 secs 2015-07-17 07:39:16.952239 osd.20 192.168.19.218:6810/30518 4141 : [WRN] slow request 30.730329 seconds old, received at 2015-07-17 07:38:46.221830: osd_op(client.33043239.0:32 benchmark_data_v10_773558_object31 [write 0~4194304] 2.5cb39991 ack+ondisk+write e10705) v4 currently waiting for subops from 7 2015-07-17 07:39:34.473372 osd.18 192.168.19.218:6811/27959 4134 : [WRN] 1 slow requests, 1 included below; oldest blocked for > 30.948511 secs 2015-07-17 07:39:34.473378 osd.18 192.168.19.218:6811/27959 4135 : [WRN] slow request 30.948511 seconds old, received at 2015-07-17 07:39:03.524801: osd_op(client.32913524.0:3087431 rbd_data.15322ae8944a.000000000011fd52 [set-alloc-hint object_size 4194304 write_size 4194304,write 0~4194304] 2.97cb1221 ack+ondisk+write e10705) v4 currently waiting for subops from 8 2015-07-17 07:42:09.982125 osd.20 192.168.19.218:6810/30518 4142 : [WRN] 1 slow requests, 1 included below; oldest blocked for > 30.816780 secs 2015-07-17 07:42:09.982135 osd.20 192.168.19.218:6810/30518 4143 : [WRN] slow request 30.816780 seconds old, received at 2015-07-17 07:41:39.165270: osd_op(client.32913524.0:3090871 rbd_data.15322ae8944a.000000000011fd8d [set-alloc-hint object_size 4194304 write_size 4194304,write 1296384~2897920] 2.c7b8f895 ack+ondisk+write e10705) v4 currently waiting for subops from 3 2015-07-17 07:43:28.839202 osd.22 192.168.19.218:6803/31969 4215 : [WRN] 1 slow requests, 1 included below; oldest blocked for > 30.910041 secs 2015-07-17 07:43:28.839212 osd.22 192.168.19.218:6803/31969 4216 : [WRN] slow request 30.910041 seconds old, received at 2015-07-17 07:42:57.929091: osd_op(client.32913524.0:3092331 rbd_data.15322ae8944a.000000000011fdb9 [set-alloc-hint object_size 4194304 write_size 4194304,write 0~4194304] 2.d330159a ack+ondisk+write e10705) v4 currently waiting for subops from 12 2015-07-17 07:45:32.505307 osd.27 192.168.19.218:6835/616 4198 : [WRN] 1 slow requests, 1 included below; oldest blocked for > 30.539282 secs 2015-07-17 07:45:32.505314 osd.27 192.168.19.218:6835/616 4199 : [WRN] slow request 30.539282 seconds old, received at 2015-07-17 07:45:01.965959: osd_op(client.33043608.0:11 benchmark_data_v10_776494_object10 [write 0~4194304] 2.1c8acd02 ack+ondisk+write e10705) v4 currently waiting for subops from 8 2015-07-17 07:45:31.951468 osd.23 192.168.19.218:6801/26435 4163 : [WRN] 1 slow requests, 1 included below; oldest blocked for > 30.096108 secs 2015-07-17 07:45:31.951479 osd.23 192.168.19.218:6801/26435 4164 : [WRN] slow request 30.096108 seconds old, received at 2015-07-17 07:45:01.855291: osd_op(client.33043608.0:12 benchmark_data_v10_776494_object11 [write 0~4194304] 2.c0939f2d ack+ondisk+write e10705) v4 currently waiting for subops from 15 2015-07-17 07:47:44.528714 osd.27 192.168.19.218:6835/616 4200 : [WRN] 1 slow requests, 1 included below; oldest blocked for > 30.597559 secs 2015-07-17 07:47:44.528725 osd.27 192.168.19.218:6835/616 4201 : [WRN] slow request 30.597559 seconds old, received at 2015-07-17 07:47:13.931089: osd_op(client.32913524.0:3095452 rbd_data.15322ae8944a.000000000011fdf5 [set-alloc-hint object_size 4194304 write_size 4194304,write 112640~4081664] 2.ee56fce9 ack+ondisk+write e10705) v4 currently waiting for subops from 2 2015-07-17 07:56:54.132850 osd.16 192.168.19.218:6823/29360 4143 : [WRN] 1 slow requests, 1 included below; oldest blocked for > 30.609297 secs 2015-07-17 07:56:54.132860 osd.16 192.168.19.218:6823/29360 4144 : [WRN] slow request 30.609297 seconds old, received at 2015-07-17 07:56:23.523489: osd_op(client.32913524.0:3103115 rbd_data.15322ae8944a.000000000011fecc [set-alloc-hint object_size 4194304 write_size 4194304,write 0~4194304] 2.b89c4ef9 ack+ondisk+write e10705) v4 currently waiting for subops from 4 2015-07-17 07:57:46.978549 osd.26 192.168.19.218:6814/32600 4186 : [WRN] 1 slow requests, 1 included below; oldest blocked for > 30.392645 secs 2015-07-17 07:57:46.978560 osd.26 192.168.19.218:6814/32600 4187 : [WRN] slow request 30.392645 seconds old, received at 2015-07-17 07:57:16.585839: osd_op(client.32913524.0:3103561 rbd_data.15322ae8944a.000000000011ff2a [set-alloc-hint object_size 4194304 write_size 4194304,write 804352~3389952] 2.d785c7be ack+ondisk+write e10705) v4 currently waiting for subops from 15 2015-07-17 07:58:32.881812 osd.25 192.168.19.218:6808/27310 4212 : [WRN] 1 slow requests, 1 included below; oldest blocked for > 30.288058 secs 2015-07-17 07:58:32.881822 osd.25 192.168.19.218:6808/27310 4213 : [WRN] slow request 30.288058 seconds old, received at 2015-07-17 07:58:02.593690: osd_op(client.32913524.0:3103784 rbd_data.15322ae8944a.000000000011ff49 [set-alloc-hint object_size 4194304 write_size 4194304,write 0~4194304] 2.15910138 ack+ondisk+write e10705) v4 currently waiting for subops from 6 2015-07-17 07:58:33.588315 osd.6 192.168.19.217:6820/6346 4064 : [WRN] 1 slow requests, 1 included below; oldest blocked for > 30.878035 secs 2015-07-17 07:58:33.588326 osd.6 192.168.19.217:6820/6346 4065 : [WRN] slow request 30.878035 seconds old, received at 2015-07-17 07:58:02.710227: osd_sub_op(client.32913524.0:3103784 2.138 15910138/rbd_data.15322ae8944a.000000000011ff49/head//2 [] v 10705'3708350 snapset=0=[]:[] snapc=0=[]) v11 currently commit sent 2015-07-17 07:59:26.151882 osd.20 192.168.19.218:6810/30518 4144 : [WRN] 1 slow requests, 1 included below; oldest blocked for > 30.282649 secs 2015-07-17 07:59:26.151892 osd.20 192.168.19.218:6810/30518 4145 : [WRN] slow request 30.282649 seconds old, received at 2015-07-17 07:58:55.869167: osd_op(client.32913524.0:3104686 rbd_data.15322ae8944a.000000000011ff5a [set-alloc-hint object_size 4194304 write_size 4194304,write 0~4194304] 2.f3ee1f19 ack+ondisk+write e10705) v4 currently waiting for subops from 6 2015-07-17 07:59:33.153127 osd.20 192.168.19.218:6810/30518 4146 : [WRN] 2 slow requests, 1 included below; oldest blocked for > 37.283897 secs 2015-07-17 07:59:33.153138 osd.20 192.168.19.218:6810/30518 4147 : [WRN] slow request 30.197115 seconds old, received at 2015-07-17 07:59:02.955949: osd_op(client.32913524.0:3104740 rbd_data.15322ae8944a.000000000011ff63 [set-alloc-hint object_size 4194304 write_size 4194304,write 0~4194304] 2.5af954ae ack+ondisk+write e10705) v4 currently waiting for subops from 6 2015-07-17 08:01:22.013001 osd.26 192.168.19.218:6814/32600 4188 : [WRN] 1 slow requests, 1 included below; oldest blocked for > 30.301797 secs 2015-07-17 08:01:22.013013 osd.26 192.168.19.218:6814/32600 4189 : [WRN] slow request 30.301797 seconds old, received at 2015-07-17 08:00:51.711130: osd_op(client.32913524.0:3105804 rbd_data.15322ae8944a.000000000011ffc5 [set-alloc-hint object_size 4194304 write_size 4194304,write 0~4194304] 2.ce78fdf7 ack+ondisk+write e10705) v4 currently waiting for subops from 6 Does this shed any light on things? Thanks! _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com