Huge issues with slow requests

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Just echoing what Christian said.

Also, iirc the "currently waiting for subobs on [" could also mean a
problem on those as it waits for ack from them (I might remember wrong).

If that is the case you might want to check in on osd 13 & 37 as well.

With the cluster load and size you should not have this problem; I'm pretty
sure you're dealing with a rogue/faulty osd/node somewhere.

Cheers,
Martin


On Fri, Sep 5, 2014 at 2:28 AM, Christian Balzer <chibi at gol.com> wrote:

> On Thu, 4 Sep 2014 12:02:13 +0200 David wrote:
>
> > Hi,
> >
> > We?re running a ceph cluster with version:
> >
> > 0.67.7-1~bpo70+1
> >
> > All of a sudden we?re having issues with the cluster (running RBD images
> > for kvm) with slow requests on all of the OSD servers. Any idea why and
> > how to fix it?
> >
> You give us a Ceph version at least, but for anybody to make guesses we
> need much more information than a log spew.
>
> How many nodes/OSDs, OS, hardware, OSD details (FS, journals on SSDs), etc.
>
> Run atop (in a sufficiently large terminal) on all your nodes, see if you
> can spot a bottleneck, like a disk being at 100% all the time with a
> much higher avio than the others.
> Looking at your logs, I'd pay particular attention to the disk holding
> osd.22.
> A single slow disk can bring a whole large cluster to a crawl.
> If you're using a hardware controller with a battery backed up cache,
> check if that is fine, loss of the battery would switch from writeback to
> writethrough and massively slow down IOPS.
>
> Regards,
>
> Christian
> >
> > 2014-09-04 11:56:35.868521 mon.0 [INF] pgmap v12504451: 6860 pgs: 6860
> > active+clean; 12163 GB data, 36308 GB used, 142 TB / 178 TB avail;
> > 634KB/s rd, 487KB/s wr, 90op/s 2014-09-04 11:56:29.510270 osd.22 [WRN]
> > 15 slow requests, 1 included below; oldest blocked for > 44.745754 secs
> > 2014-09-04 11:56:29.510276 osd.22 [WRN] slow request 30.999821 seconds
> > old, received at 2014-09-04 11:55:58.510424:
> > osd_op(client.10731617.0:81868956
> > rbd_data.967e022eb141f2.0000000000000e72 [write 0~4194304] 3.c585cebe
> > e13901) v4 currently waiting for subops from [37,13] 2014-09-04
> > 11:56:30.510528 osd.22 [WRN] 21 slow requests, 6 included below; oldest
> > blocked for > 45.745989 secs 2014-09-04 11:56:30.510534 osd.22 [WRN]
> > slow request 30.122555 seconds old, received at 2014-09-04
> > 11:56:00.387925: osd_op(client.13425082.0:11962345
> > rbd_data.54f24c3d1b58ba.0000000000003753 [stat,write 1114112~8192]
> > 3.c9e49140 e13901) v4 currently waiting for subops from [13,42]
> > 2014-09-04 11:56:30.510537 osd.22 [WRN] slow request 30.122362 seconds
> > old, received at 2014-09-04 11:56:00.388118:
> > osd_op(client.13425082.0:11962352
> > rbd_data.54f24c3d1b58ba.0000000000003753 [stat,write 1134592~4096]
> > 3.c9e49140 e13901) v4 currently waiting for subops from [13,42]
> > 2014-09-04 11:56:30.510541 osd.22 [WRN] slow request 30.122298 seconds
> > old, received at 2014-09-04 11:56:00.388182:
> > osd_op(client.13425082.0:11962353
> > rbd_data.54f24c3d1b58ba.0000000000003753 [stat,write 4046848~8192]
> > 3.c9e49140 e13901) v4 currently waiting for subops from [13,42]
> > 2014-09-04 11:56:30.510544 osd.22 [WRN] slow request 30.121577 seconds
> > old, received at 2014-09-04 11:56:00.388903:
> > osd_op(client.13425082.0:11962374
> > rbd_data.54f24c3d1b58ba.00000000000047f2 [stat,write 2527232~4096]
> > 3.cd9a9015 e13901) v4 currently waiting for subops from [45,1]
> > 2014-09-04 11:56:30.510548 osd.22 [WRN] slow request 30.121518 seconds
> > old, received at 2014-09-04 11:56:00.388962:
> > osd_op(client.13425082.0:11962375
> > rbd_data.54f24c3d1b58ba.00000000000047f2 [stat,write 3133440~4096]
> > 3.cd9a9015 e13901) v4 currently waiting for subops from [45,1]
> > 2014-09-04 11:56:31.510706 osd.22 [WRN] 26 slow requests, 6 included
> > below; oldest blocked for > 46.746163 secs 2014-09-04 11:56:31.510711
> > osd.22 [WRN] slow request 31.035418 seconds old, received at 2014-09-04
> > 11:56:00.475236: osd_op(client.9266625.0:135900595
> > rbd_data.42d6792eb141f2.000000000000bc00 [stat,write 2097152~4096]
> > 3.a2894ebe e13901) v4 currently waiting for subops from [37,13]
> > 2014-09-04 11:56:31.510715 osd.22 [WRN] slow request 31.035335 seconds
> > old, received at 2014-09-04 11:56:00.475319:
> > osd_op(client.9266625.0:135900596
> > rbd_data.42d6792eb141f2.000000000000bc00 [stat,write 2162688~4096]
> > 3.a2894ebe e13901) v4 currently waiting for subops from [37,13]
> > 2014-09-04 11:56:31.510718 osd.22 [WRN] slow request 31.035270 seconds
> > old, received at 2014-09-04 11:56:00.475384:
> > osd_op(client.9266625.0:135900597
> > rbd_data.42d6792eb141f2.000000000000bc00 [stat,write 2400256~16384]
> > 3.a2894ebe e13901) v4 currently waiting for subops from [37,13]
> > 2014-09-04 11:56:31.510721 osd.22 [WRN] slow request 31.035093 seconds
> > old, received at 2014-09-04 11:56:00.475561:
> > osd_op(client.9266625.0:135900598
> > rbd_data.42d6792eb141f2.000000000000bc00 [stat,write 2420736~4096]
> > 3.a2894ebe e13901) v4 currently waiting for subops from [37,13]
> > 2014-09-04 11:56:31.510724 osd.22 [WRN] slow request 31.034990 seconds
> > old, received at 2014-09-04 11:56:00.475664:
> > osd_op(client.9266625.0:135900599
> > rbd_data.42d6792eb141f2.000000000000bc00 [stat,write 2428928~12288]
> > 3.a2894ebe e13901) v4 currently waiting for subops from [37,13]
> > 2014-09-04 11:56:32.510864 osd.22 [WRN] 28 slow requests, 6 included
> > below; oldest blocked for > 47.746324 secs 2014-09-04 11:56:32.510871
> > osd.22 [WRN] slow request 32.035026 seconds old, received at 2014-09-04
> > 11:56:00.475789: osd_op(client.9266625.0:135900600
> > rbd_data.42d6792eb141f2.000000000000bc00 [stat,write 2453504~4096]
> > 3.a2894ebe e13901) v4 currently waiting for subops from [37,13]
> > 2014-09-04 11:56:32.510876 osd.22 [WRN] slow request 32.034982 seconds
> > old, received at 2014-09-04 11:56:00.475833:
> > osd_op(client.9266625.0:135900601
> > rbd_data.42d6792eb141f2.000000000000bc00 [stat,write 2502656~16384]
> > 3.a2894ebe e13901) v4 currently waiting for subops from [37,13]
> > 2014-09-04 11:56:32.510881 osd.22 [WRN] slow request 32.034917 seconds
> > old, received at 2014-09-04 11:56:00.475898:
> > osd_op(client.9266625.0:135900602
> > rbd_data.42d6792eb141f2.000000000000bc00 [stat,write 2600960~4096]
> > 3.a2894ebe e13901) v4 currently waiting for subops from [37,13]
> > 2014-09-04 11:56:32.510885 osd.22 [WRN] slow request 32.034807 seconds
> > old, received at 2014-09-04 11:56:00.476008:
> > osd_op(client.9266625.0:135900603
> > rbd_data.42d6792eb141f2.000000000000bc00 [stat,write 2617344~4096]
> > 3.a2894ebe e13901) v4 currently waiting for subops from [37,13]
> > 2014-09-04 11:56:32.510890 osd.22 [WRN] slow request 32.034756 seconds
> > old, received at 2014-09-04 11:56:00.476059:
> > osd_op(client.9266625.0:135900604
> > rbd_data.42d6792eb141f2.000000000000bc00 [stat,write 2637824~69632]
> > 3.a2894ebe e13901) v4 currently waiting for subops from [37,13]
> > 2014-09-04 11:56:32.723329 osd.17 [WRN] 4 slow requests, 4 included
> > below; oldest blocked for > 30.605794 secs 2014-09-04 11:56:32.723335
> > osd.17 [WRN] slow request 30.605794 seconds old, received at 2014-09-04
> > 11:56:02.117495: osd_op(client.12646411.0:31118135
> > rbd_data.81aaf52eb141f2.0000000000000e4b [stat,write 0~4194304]
> > 3.716b373b e13901) v4 currently waiting for subops from [31,1]
> > 2014-09-04 11:56:32.723338 osd.17 [WRN] slow request 30.362042 seconds
> > old, received at 2014-09-04 11:56:02.361247:
> > osd_op(client.12485793.0:121786489
> > rbd_data.4ae87f2eb141f2.000000000000e200 [stat,write 2834432~4096]
> > 3.6b43de72 e13901) v4 currently waiting for subops from [5,31]
> > 2014-09-04 11:56:32.723341 osd.17 [WRN] slow request 30.179681 seconds
> > old, received at 2014-09-04 11:56:02.543608:
> > osd_op(client.13425079.0:7014393
> > rbd_data.54f2522eb141f2.0000000000014d01 [stat,write 2502656~4096]
> > 3.d8ceea89 e13901) v4 currently waiting for subops from [31,18]
> > 2014-09-04 11:56:32.723344 osd.17 [WRN] slow request 30.000568 seconds
> > old, received at 2014-09-04 11:56:02.722721:
> > osd_op(client.9281279.0:144230325
> > rbd_data.2d99fc2eb141f2.000000000000e80a [stat,write 1589248~4096]
> > 3.16dc0e89 e13901) v4 currently waiting for subops from [31,18]
> > 2014-09-04 11:56:33.511043 osd.22 [WRN] 33 slow requests, 6 included
> > below; oldest blocked for > 48.746486 secs 2014-09-04 11:56:33.511049
> > osd.22 [WRN] slow request 33.034756 seconds old, received at 2014-09-04
> > 11:56:00.476221: osd_op(client.9266625.0:135900605
> > rbd_data.42d6792eb141f2.000000000000bc00 [stat,write 2715648~8192]
> > 3.a2894ebe e13901) v4 currently waiting for subops from [37,13]
> > 2014-09-04 11:56:33.511053 osd.22 [WRN] slow request 33.034620 seconds
> > old, received at 2014-09-04 11:56:00.476357:
> > osd_op(client.9266625.0:135900606
> > rbd_data.42d6792eb141f2.000000000000bc00 [stat,write 2756608~4096]
> > 3.a2894ebe e13901) v4 currently waiting for subops from [37,13]
> > 2014-09-04 11:56:33.511056 osd.22 [WRN] slow request 33.034446 seconds
> > old, received at 2014-09-04 11:56:00.476531:
> > osd_op(client.9266625.0:135900607
> > rbd_data.42d6792eb141f2.000000000000bc00 [stat,write 2809856~4096]
> > 3.a2894ebe e13901) v4 currently waiting for subops from [37,13]
> > 2014-09-04 11:56:33.511060 osd.22 [WRN] slow request 33.034387 seconds
> > old, received at 2014-09-04 11:56:00.476590:
> > osd_op(client.9266625.0:135900608
> > rbd_data.42d6792eb141f2.000000000000bc00 [stat,write 2883584~4096]
> > 3.a2894ebe e13901) v4 currently waiting for subops from [37,13]
> > 2014-09-04 11:56:33.511063 osd.22 [WRN] slow request 33.034341 seconds
> > old, received at 2014-09-04 11:56:00.476636:
> > osd_op(client.9266625.0:135900609
> > rbd_data.42d6792eb141f2.000000000000bc00 [stat,write 2891776~8192]
> > 3.a2894ebe e13901) v4 currently waiting for subops from [37,13]
> > 2014-09-04 11:56:34.511206 osd.22 [WRN] 38 slow requests, 6 included
> > below; oldest blocked for > 49.746664 secs 2014-09-04 11:56:34.511215
> > osd.22 [WRN] slow request 34.034426 seconds old, received at 2014-09-04
> > 11:56:00.476729: osd_op(client.9266625.0:135900610
> > rbd_data.42d6792eb141f2.000000000000bc00 [stat,write 2908160~4096]
> > 3.a2894ebe e13901) v4 currently waiting for subops from [37,13]
> > 2014-09-04 11:56:34.511219 osd.22 [WRN] slow request 34.034298 seconds
> > old, received at 2014-09-04 11:56:00.476857:
> > osd_op(client.9266625.0:135900611
> > rbd_data.42d6792eb141f2.000000000000bc00 [stat,write 2932736~28672]
> > 3.a2894ebe e13901) v4 currently waiting for subops from [37,13]
> > 2014-09-04 11:56:34.511222 osd.22 [WRN] slow request 34.034234 seconds
> > old, received at 2014-09-04 11:56:00.476921:
> > osd_op(client.9266625.0:135900612
> > rbd_data.42d6792eb141f2.000000000000bc00 [stat,write 2969600~8192]
> > 3.a2894ebe e13901) v4 currently waiting for subops from [37,13]
> > 2014-09-04 11:56:34.511225 osd.22 [WRN] slow request 34.034147 seconds
> > old, received at 2014-09-04 11:56:00.477008:
> > osd_op(client.9266625.0:135900613
> > rbd_data.42d6792eb141f2.000000000000bc00 [stat,write 2981888~4096]
> > 3.a2894ebe e13901) v4 currently waiting for subops from [37,13]
> > 2014-09-04 11:56:34.511228 osd.22 [WRN] slow request 34.034055 seconds
> > old, received at 2014-09-04 11:56:00.477100:
> > osd_op(client.9266625.0:135900614
> > rbd_data.42d6792eb141f2.000000000000bc00 [stat,write 2998272~12288]
> > 3.a2894ebe e13901) v4 currently waiting for subops from [37,13]
> > 2014-09-04 11:56:35.153813 osd.37 [WRN] 1 slow requests, 1 included
> > below; oldest blocked for > 30.307864 secs 2014-09-04 11:56:35.153820
> > osd.37 [WRN] slow request 30.307864 seconds old, received at 2014-09-04
> > 11:56:04.845917: osd_op(client.10785413.0:129409910
> > rbd_data.96803f2eb141f2.00000000000042b0 [stat,write 1712128~4096]
> > 3.6c5ac950 e13901) v4 currently waiting for subops from [22,41]
> > 2014-09-04 11:56:35.511450 osd.22 [WRN] 42 slow requests, 6 included
> > below; oldest blocked for > 50.746638 secs 2014-09-04 11:56:35.511455
> > osd.22 [WRN] slow request 35.034219 seconds old, received at 2014-09-04
> > 11:56:00.477178: osd_op(client.9266625.0:135900615
> > rbd_data.42d6792eb141f2.000000000000bc00 [stat,write 3022848~8192]
> > 3.a2894ebe e13901) v4 currently waiting for subops from [37,13]
> > 2014-09-04 11:56:35.511459 osd.22 [WRN] slow request 35.034143 seconds
> > old, received at 2014-09-04 11:56:00.477254:
> > osd_op(client.9266625.0:135900616
> > rbd_data.42d6792eb141f2.000000000000bc00 [stat,write 3035136~12288]
> > 3.a2894ebe e13901) v4 currently waiting for subops from [37,13]
> > 2014-09-04 11:56:35.511462 osd.22 [WRN] slow request 35.034082 seconds
> > old, received at 2014-09-04 11:56:00.477315:
> > osd_op(client.9266625.0:135900617
> > rbd_data.42d6792eb141f2.000000000000bc00 [stat,write 3055616~4096]
> > 3.a2894ebe e13901) v4 currently waiting for subops from [37,13]
> > 2014-09-04 11:56:35.511465 osd.22 [WRN] slow request 35.033965 seconds
> > old, received at 2014-09-04 11:56:00.477432:
> > osd_op(client.9266625.0:135900618
> > rbd_data.42d6792eb141f2.000000000000bc00 [stat,write 3063808~4096]
> > 3.a2894ebe e13901) v4 currently waiting for subops from [37,13]
> > 2014-09-04 11:56:35.511469 osd.22 [WRN] slow request 34.162275 seconds
> > old, received at 2014-09-04 11:56:01.349122:
> > osd_op(client.12485646.0:110385143
> > rbd_data.4ccab12eb141f2.0000000000015b66 [stat,write 909312~4096]
> > 3.9c416abe e13901) v4 currently waiting for subops from [37,13]
> > 2014-09-04 11:56:36.511603 osd.22 [WRN] 34 slow requests, 6 included
> > below; oldest blocked for > 51.746797 secs 2014-09-04 11:56:36.511609
> > osd.22 [WRN] slow request 34.401476 seconds old, received at 2014-09-04
> > 11:56:02.110080: osd_op(client.12485643.0:117955489
> > rbd_data.4aad332eb141f2.0000000000002c00 [stat,write 2531328~4096]
> > 3.b985ec92 e13901) v4 currently waiting for subops from [33,1]
> > 2014-09-04 11:56:36.511612 osd.22 [WRN] slow request 34.149511 seconds
> > old, received at 2014-09-04 11:56:02.362045:
> > osd_op(client.12485793.0:121786505
> > rbd_data.4ae87f2eb141f2.0000000000018a0a [stat,write 3342336~4096]
> > 3.26907492 e13901) v4 currently waiting for subops from [33,1]
> > 2014-09-04 11:56:36.511615 osd.22 [WRN] slow request 34.329528 seconds
> > old, received at 2014-09-04 11:56:02.182028:
> > osd_op(client.12646411.0:31118266
> > rbd_data.81aaf52eb141f2.0000000000000e4f [stat,write 0~4194304]
> > 3.168c978d e13901) v4 currently waiting for subops from [49,29]
> > 2014-09-04 11:56:36.511618 osd.22 [WRN] slow request 31.871976 seconds
> > old, received at 2014-09-04 11:56:04.639580:
> > osd_op(client.12485793.0:121786506
> > rbd_data.4ae87f2eb141f2.0000000000018a0a [stat,write 0~4096] 3.26907492
> > e13901) v4 currently waiting for subops from [33,1] 2014-09-04
> > 11:56:36.511621 osd.22 [WRN] slow request 31.008867 seconds old,
> > received at 2014-09-04 11:56:05.502689:
> > osd_op(client.9266625.0:135901516
> > rbd_data.42d6792eb141f2.0000000000017436 [stat,write 2682880~4096]
> > 3.32067b97 e13901) v4 currently waiting for subops from [12,29]
> > 2014-09-04 11:56:36.723882 osd.17 [WRN] 5 slow requests, 1 included
> > below; oldest blocked for > 34.606349 secs 2014-09-04 11:56:36.723887
> > osd.17 [WRN] slow request 30.603057 seconds old, received at 2014-09-04
> > 11:56:06.120787: osd_op(client.12483520.0:11473870
> > rbd_data.4b8e9b3d1b58ba.0000000000004600 [stat,write 2293760~4096]
> > 3.49398689 e13901) v4 currently waiting for subops from [31,18]
> > 2014-09-04 11:56:37.511787 osd.22 [WRN] 24 slow requests, 6 included
> > below; oldest blocked for > 52.746984 secs 2014-09-04 11:56:37.511793
> > osd.22 [WRN] slow request 32.008986 seconds old, received at 2014-09-04
> > 11:56:05.502757: osd_op(client.9266625.0:135901517
> > rbd_data.42d6792eb141f2.0000000000017436 [stat,write 2695168~4096]
> > 3.32067b97 e13901) v4 currently waiting for subops from [12,29]
> > 2014-09-04 11:56:37.511798 osd.22 [WRN] slow request 31.749483 seconds
> > old, received at 2014-09-04 11:56:05.762260:
> > osd_op(client.9281282.0:55728036
> > rbd_data.2d99f62eb141f2.0000000000003753 [stat,write 1630208~8192]
> > 3.6625df97 e13901) v4 currently waiting for subops from [12,29]
> > 2014-09-04 11:56:37.511801 osd.22 [WRN] slow request 31.745558 seconds
> > old, received at 2014-09-04 11:56:05.766185:
> > osd_op(client.9281282.0:55728038
> > rbd_data.2d99f62eb141f2.0000000000003753 [stat,write 1941504~4096]
> > 3.6625df97 e13901) v4 currently waiting for subops from [12,29]
> > 2014-09-04 11:56:37.511803 osd.22 [WRN] slow request 31.390945 seconds
> > old, received at 2014-09-04 11:56:06.120798:
> > osd_op(client.12483520.0:11473871
> > rbd_data.4b8e9b3d1b58ba.0000000000005c00 [stat,write 2424832~8192]
> > 3.47dbbb97 e13901) v4 currently waiting for subops from [12,29]
> > 2014-09-04 11:56:37.511806 osd.22 [WRN] slow request 31.390357 seconds
> > old, received at 2014-09-04 11:56:06.121386:
> > osd_op(client.12483520.0:11473884
> > rbd_data.4b8e9b3d1b58ba.0000000000005c00 [stat,write 2457600~16384]
> > 3.47dbbb97 e13901) v4 currently waiting for subops from [12,29]
> >
> > Kind Regards,
> >
> > David
> >
> >
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users at lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>
>
> --
> Christian Balzer        Network/Systems Engineer
> chibi at gol.com           Global OnLine Japan/Fusion Communications
> http://www.gol.com/
> _______________________________________________
> ceph-users mailing list
> ceph-users at lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140905/36ce38cd/attachment.htm>


[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux