Just echoing what Christian said. Also, iirc the "currently waiting for subobs on [" could also mean a problem on those as it waits for ack from them (I might remember wrong). If that is the case you might want to check in on osd 13 & 37 as well. With the cluster load and size you should not have this problem; I'm pretty sure you're dealing with a rogue/faulty osd/node somewhere. Cheers, Martin On Fri, Sep 5, 2014 at 2:28 AM, Christian Balzer <chibi at gol.com> wrote: > On Thu, 4 Sep 2014 12:02:13 +0200 David wrote: > > > Hi, > > > > We?re running a ceph cluster with version: > > > > 0.67.7-1~bpo70+1 > > > > All of a sudden we?re having issues with the cluster (running RBD images > > for kvm) with slow requests on all of the OSD servers. Any idea why and > > how to fix it? > > > You give us a Ceph version at least, but for anybody to make guesses we > need much more information than a log spew. > > How many nodes/OSDs, OS, hardware, OSD details (FS, journals on SSDs), etc. > > Run atop (in a sufficiently large terminal) on all your nodes, see if you > can spot a bottleneck, like a disk being at 100% all the time with a > much higher avio than the others. > Looking at your logs, I'd pay particular attention to the disk holding > osd.22. > A single slow disk can bring a whole large cluster to a crawl. > If you're using a hardware controller with a battery backed up cache, > check if that is fine, loss of the battery would switch from writeback to > writethrough and massively slow down IOPS. > > Regards, > > Christian > > > > 2014-09-04 11:56:35.868521 mon.0 [INF] pgmap v12504451: 6860 pgs: 6860 > > active+clean; 12163 GB data, 36308 GB used, 142 TB / 178 TB avail; > > 634KB/s rd, 487KB/s wr, 90op/s 2014-09-04 11:56:29.510270 osd.22 [WRN] > > 15 slow requests, 1 included below; oldest blocked for > 44.745754 secs > > 2014-09-04 11:56:29.510276 osd.22 [WRN] slow request 30.999821 seconds > > old, received at 2014-09-04 11:55:58.510424: > > osd_op(client.10731617.0:81868956 > > rbd_data.967e022eb141f2.0000000000000e72 [write 0~4194304] 3.c585cebe > > e13901) v4 currently waiting for subops from [37,13] 2014-09-04 > > 11:56:30.510528 osd.22 [WRN] 21 slow requests, 6 included below; oldest > > blocked for > 45.745989 secs 2014-09-04 11:56:30.510534 osd.22 [WRN] > > slow request 30.122555 seconds old, received at 2014-09-04 > > 11:56:00.387925: osd_op(client.13425082.0:11962345 > > rbd_data.54f24c3d1b58ba.0000000000003753 [stat,write 1114112~8192] > > 3.c9e49140 e13901) v4 currently waiting for subops from [13,42] > > 2014-09-04 11:56:30.510537 osd.22 [WRN] slow request 30.122362 seconds > > old, received at 2014-09-04 11:56:00.388118: > > osd_op(client.13425082.0:11962352 > > rbd_data.54f24c3d1b58ba.0000000000003753 [stat,write 1134592~4096] > > 3.c9e49140 e13901) v4 currently waiting for subops from [13,42] > > 2014-09-04 11:56:30.510541 osd.22 [WRN] slow request 30.122298 seconds > > old, received at 2014-09-04 11:56:00.388182: > > osd_op(client.13425082.0:11962353 > > rbd_data.54f24c3d1b58ba.0000000000003753 [stat,write 4046848~8192] > > 3.c9e49140 e13901) v4 currently waiting for subops from [13,42] > > 2014-09-04 11:56:30.510544 osd.22 [WRN] slow request 30.121577 seconds > > old, received at 2014-09-04 11:56:00.388903: > > osd_op(client.13425082.0:11962374 > > rbd_data.54f24c3d1b58ba.00000000000047f2 [stat,write 2527232~4096] > > 3.cd9a9015 e13901) v4 currently waiting for subops from [45,1] > > 2014-09-04 11:56:30.510548 osd.22 [WRN] slow request 30.121518 seconds > > old, received at 2014-09-04 11:56:00.388962: > > osd_op(client.13425082.0:11962375 > > rbd_data.54f24c3d1b58ba.00000000000047f2 [stat,write 3133440~4096] > > 3.cd9a9015 e13901) v4 currently waiting for subops from [45,1] > > 2014-09-04 11:56:31.510706 osd.22 [WRN] 26 slow requests, 6 included > > below; oldest blocked for > 46.746163 secs 2014-09-04 11:56:31.510711 > > osd.22 [WRN] slow request 31.035418 seconds old, received at 2014-09-04 > > 11:56:00.475236: osd_op(client.9266625.0:135900595 > > rbd_data.42d6792eb141f2.000000000000bc00 [stat,write 2097152~4096] > > 3.a2894ebe e13901) v4 currently waiting for subops from [37,13] > > 2014-09-04 11:56:31.510715 osd.22 [WRN] slow request 31.035335 seconds > > old, received at 2014-09-04 11:56:00.475319: > > osd_op(client.9266625.0:135900596 > > rbd_data.42d6792eb141f2.000000000000bc00 [stat,write 2162688~4096] > > 3.a2894ebe e13901) v4 currently waiting for subops from [37,13] > > 2014-09-04 11:56:31.510718 osd.22 [WRN] slow request 31.035270 seconds > > old, received at 2014-09-04 11:56:00.475384: > > osd_op(client.9266625.0:135900597 > > rbd_data.42d6792eb141f2.000000000000bc00 [stat,write 2400256~16384] > > 3.a2894ebe e13901) v4 currently waiting for subops from [37,13] > > 2014-09-04 11:56:31.510721 osd.22 [WRN] slow request 31.035093 seconds > > old, received at 2014-09-04 11:56:00.475561: > > osd_op(client.9266625.0:135900598 > > rbd_data.42d6792eb141f2.000000000000bc00 [stat,write 2420736~4096] > > 3.a2894ebe e13901) v4 currently waiting for subops from [37,13] > > 2014-09-04 11:56:31.510724 osd.22 [WRN] slow request 31.034990 seconds > > old, received at 2014-09-04 11:56:00.475664: > > osd_op(client.9266625.0:135900599 > > rbd_data.42d6792eb141f2.000000000000bc00 [stat,write 2428928~12288] > > 3.a2894ebe e13901) v4 currently waiting for subops from [37,13] > > 2014-09-04 11:56:32.510864 osd.22 [WRN] 28 slow requests, 6 included > > below; oldest blocked for > 47.746324 secs 2014-09-04 11:56:32.510871 > > osd.22 [WRN] slow request 32.035026 seconds old, received at 2014-09-04 > > 11:56:00.475789: osd_op(client.9266625.0:135900600 > > rbd_data.42d6792eb141f2.000000000000bc00 [stat,write 2453504~4096] > > 3.a2894ebe e13901) v4 currently waiting for subops from [37,13] > > 2014-09-04 11:56:32.510876 osd.22 [WRN] slow request 32.034982 seconds > > old, received at 2014-09-04 11:56:00.475833: > > osd_op(client.9266625.0:135900601 > > rbd_data.42d6792eb141f2.000000000000bc00 [stat,write 2502656~16384] > > 3.a2894ebe e13901) v4 currently waiting for subops from [37,13] > > 2014-09-04 11:56:32.510881 osd.22 [WRN] slow request 32.034917 seconds > > old, received at 2014-09-04 11:56:00.475898: > > osd_op(client.9266625.0:135900602 > > rbd_data.42d6792eb141f2.000000000000bc00 [stat,write 2600960~4096] > > 3.a2894ebe e13901) v4 currently waiting for subops from [37,13] > > 2014-09-04 11:56:32.510885 osd.22 [WRN] slow request 32.034807 seconds > > old, received at 2014-09-04 11:56:00.476008: > > osd_op(client.9266625.0:135900603 > > rbd_data.42d6792eb141f2.000000000000bc00 [stat,write 2617344~4096] > > 3.a2894ebe e13901) v4 currently waiting for subops from [37,13] > > 2014-09-04 11:56:32.510890 osd.22 [WRN] slow request 32.034756 seconds > > old, received at 2014-09-04 11:56:00.476059: > > osd_op(client.9266625.0:135900604 > > rbd_data.42d6792eb141f2.000000000000bc00 [stat,write 2637824~69632] > > 3.a2894ebe e13901) v4 currently waiting for subops from [37,13] > > 2014-09-04 11:56:32.723329 osd.17 [WRN] 4 slow requests, 4 included > > below; oldest blocked for > 30.605794 secs 2014-09-04 11:56:32.723335 > > osd.17 [WRN] slow request 30.605794 seconds old, received at 2014-09-04 > > 11:56:02.117495: osd_op(client.12646411.0:31118135 > > rbd_data.81aaf52eb141f2.0000000000000e4b [stat,write 0~4194304] > > 3.716b373b e13901) v4 currently waiting for subops from [31,1] > > 2014-09-04 11:56:32.723338 osd.17 [WRN] slow request 30.362042 seconds > > old, received at 2014-09-04 11:56:02.361247: > > osd_op(client.12485793.0:121786489 > > rbd_data.4ae87f2eb141f2.000000000000e200 [stat,write 2834432~4096] > > 3.6b43de72 e13901) v4 currently waiting for subops from [5,31] > > 2014-09-04 11:56:32.723341 osd.17 [WRN] slow request 30.179681 seconds > > old, received at 2014-09-04 11:56:02.543608: > > osd_op(client.13425079.0:7014393 > > rbd_data.54f2522eb141f2.0000000000014d01 [stat,write 2502656~4096] > > 3.d8ceea89 e13901) v4 currently waiting for subops from [31,18] > > 2014-09-04 11:56:32.723344 osd.17 [WRN] slow request 30.000568 seconds > > old, received at 2014-09-04 11:56:02.722721: > > osd_op(client.9281279.0:144230325 > > rbd_data.2d99fc2eb141f2.000000000000e80a [stat,write 1589248~4096] > > 3.16dc0e89 e13901) v4 currently waiting for subops from [31,18] > > 2014-09-04 11:56:33.511043 osd.22 [WRN] 33 slow requests, 6 included > > below; oldest blocked for > 48.746486 secs 2014-09-04 11:56:33.511049 > > osd.22 [WRN] slow request 33.034756 seconds old, received at 2014-09-04 > > 11:56:00.476221: osd_op(client.9266625.0:135900605 > > rbd_data.42d6792eb141f2.000000000000bc00 [stat,write 2715648~8192] > > 3.a2894ebe e13901) v4 currently waiting for subops from [37,13] > > 2014-09-04 11:56:33.511053 osd.22 [WRN] slow request 33.034620 seconds > > old, received at 2014-09-04 11:56:00.476357: > > osd_op(client.9266625.0:135900606 > > rbd_data.42d6792eb141f2.000000000000bc00 [stat,write 2756608~4096] > > 3.a2894ebe e13901) v4 currently waiting for subops from [37,13] > > 2014-09-04 11:56:33.511056 osd.22 [WRN] slow request 33.034446 seconds > > old, received at 2014-09-04 11:56:00.476531: > > osd_op(client.9266625.0:135900607 > > rbd_data.42d6792eb141f2.000000000000bc00 [stat,write 2809856~4096] > > 3.a2894ebe e13901) v4 currently waiting for subops from [37,13] > > 2014-09-04 11:56:33.511060 osd.22 [WRN] slow request 33.034387 seconds > > old, received at 2014-09-04 11:56:00.476590: > > osd_op(client.9266625.0:135900608 > > rbd_data.42d6792eb141f2.000000000000bc00 [stat,write 2883584~4096] > > 3.a2894ebe e13901) v4 currently waiting for subops from [37,13] > > 2014-09-04 11:56:33.511063 osd.22 [WRN] slow request 33.034341 seconds > > old, received at 2014-09-04 11:56:00.476636: > > osd_op(client.9266625.0:135900609 > > rbd_data.42d6792eb141f2.000000000000bc00 [stat,write 2891776~8192] > > 3.a2894ebe e13901) v4 currently waiting for subops from [37,13] > > 2014-09-04 11:56:34.511206 osd.22 [WRN] 38 slow requests, 6 included > > below; oldest blocked for > 49.746664 secs 2014-09-04 11:56:34.511215 > > osd.22 [WRN] slow request 34.034426 seconds old, received at 2014-09-04 > > 11:56:00.476729: osd_op(client.9266625.0:135900610 > > rbd_data.42d6792eb141f2.000000000000bc00 [stat,write 2908160~4096] > > 3.a2894ebe e13901) v4 currently waiting for subops from [37,13] > > 2014-09-04 11:56:34.511219 osd.22 [WRN] slow request 34.034298 seconds > > old, received at 2014-09-04 11:56:00.476857: > > osd_op(client.9266625.0:135900611 > > rbd_data.42d6792eb141f2.000000000000bc00 [stat,write 2932736~28672] > > 3.a2894ebe e13901) v4 currently waiting for subops from [37,13] > > 2014-09-04 11:56:34.511222 osd.22 [WRN] slow request 34.034234 seconds > > old, received at 2014-09-04 11:56:00.476921: > > osd_op(client.9266625.0:135900612 > > rbd_data.42d6792eb141f2.000000000000bc00 [stat,write 2969600~8192] > > 3.a2894ebe e13901) v4 currently waiting for subops from [37,13] > > 2014-09-04 11:56:34.511225 osd.22 [WRN] slow request 34.034147 seconds > > old, received at 2014-09-04 11:56:00.477008: > > osd_op(client.9266625.0:135900613 > > rbd_data.42d6792eb141f2.000000000000bc00 [stat,write 2981888~4096] > > 3.a2894ebe e13901) v4 currently waiting for subops from [37,13] > > 2014-09-04 11:56:34.511228 osd.22 [WRN] slow request 34.034055 seconds > > old, received at 2014-09-04 11:56:00.477100: > > osd_op(client.9266625.0:135900614 > > rbd_data.42d6792eb141f2.000000000000bc00 [stat,write 2998272~12288] > > 3.a2894ebe e13901) v4 currently waiting for subops from [37,13] > > 2014-09-04 11:56:35.153813 osd.37 [WRN] 1 slow requests, 1 included > > below; oldest blocked for > 30.307864 secs 2014-09-04 11:56:35.153820 > > osd.37 [WRN] slow request 30.307864 seconds old, received at 2014-09-04 > > 11:56:04.845917: osd_op(client.10785413.0:129409910 > > rbd_data.96803f2eb141f2.00000000000042b0 [stat,write 1712128~4096] > > 3.6c5ac950 e13901) v4 currently waiting for subops from [22,41] > > 2014-09-04 11:56:35.511450 osd.22 [WRN] 42 slow requests, 6 included > > below; oldest blocked for > 50.746638 secs 2014-09-04 11:56:35.511455 > > osd.22 [WRN] slow request 35.034219 seconds old, received at 2014-09-04 > > 11:56:00.477178: osd_op(client.9266625.0:135900615 > > rbd_data.42d6792eb141f2.000000000000bc00 [stat,write 3022848~8192] > > 3.a2894ebe e13901) v4 currently waiting for subops from [37,13] > > 2014-09-04 11:56:35.511459 osd.22 [WRN] slow request 35.034143 seconds > > old, received at 2014-09-04 11:56:00.477254: > > osd_op(client.9266625.0:135900616 > > rbd_data.42d6792eb141f2.000000000000bc00 [stat,write 3035136~12288] > > 3.a2894ebe e13901) v4 currently waiting for subops from [37,13] > > 2014-09-04 11:56:35.511462 osd.22 [WRN] slow request 35.034082 seconds > > old, received at 2014-09-04 11:56:00.477315: > > osd_op(client.9266625.0:135900617 > > rbd_data.42d6792eb141f2.000000000000bc00 [stat,write 3055616~4096] > > 3.a2894ebe e13901) v4 currently waiting for subops from [37,13] > > 2014-09-04 11:56:35.511465 osd.22 [WRN] slow request 35.033965 seconds > > old, received at 2014-09-04 11:56:00.477432: > > osd_op(client.9266625.0:135900618 > > rbd_data.42d6792eb141f2.000000000000bc00 [stat,write 3063808~4096] > > 3.a2894ebe e13901) v4 currently waiting for subops from [37,13] > > 2014-09-04 11:56:35.511469 osd.22 [WRN] slow request 34.162275 seconds > > old, received at 2014-09-04 11:56:01.349122: > > osd_op(client.12485646.0:110385143 > > rbd_data.4ccab12eb141f2.0000000000015b66 [stat,write 909312~4096] > > 3.9c416abe e13901) v4 currently waiting for subops from [37,13] > > 2014-09-04 11:56:36.511603 osd.22 [WRN] 34 slow requests, 6 included > > below; oldest blocked for > 51.746797 secs 2014-09-04 11:56:36.511609 > > osd.22 [WRN] slow request 34.401476 seconds old, received at 2014-09-04 > > 11:56:02.110080: osd_op(client.12485643.0:117955489 > > rbd_data.4aad332eb141f2.0000000000002c00 [stat,write 2531328~4096] > > 3.b985ec92 e13901) v4 currently waiting for subops from [33,1] > > 2014-09-04 11:56:36.511612 osd.22 [WRN] slow request 34.149511 seconds > > old, received at 2014-09-04 11:56:02.362045: > > osd_op(client.12485793.0:121786505 > > rbd_data.4ae87f2eb141f2.0000000000018a0a [stat,write 3342336~4096] > > 3.26907492 e13901) v4 currently waiting for subops from [33,1] > > 2014-09-04 11:56:36.511615 osd.22 [WRN] slow request 34.329528 seconds > > old, received at 2014-09-04 11:56:02.182028: > > osd_op(client.12646411.0:31118266 > > rbd_data.81aaf52eb141f2.0000000000000e4f [stat,write 0~4194304] > > 3.168c978d e13901) v4 currently waiting for subops from [49,29] > > 2014-09-04 11:56:36.511618 osd.22 [WRN] slow request 31.871976 seconds > > old, received at 2014-09-04 11:56:04.639580: > > osd_op(client.12485793.0:121786506 > > rbd_data.4ae87f2eb141f2.0000000000018a0a [stat,write 0~4096] 3.26907492 > > e13901) v4 currently waiting for subops from [33,1] 2014-09-04 > > 11:56:36.511621 osd.22 [WRN] slow request 31.008867 seconds old, > > received at 2014-09-04 11:56:05.502689: > > osd_op(client.9266625.0:135901516 > > rbd_data.42d6792eb141f2.0000000000017436 [stat,write 2682880~4096] > > 3.32067b97 e13901) v4 currently waiting for subops from [12,29] > > 2014-09-04 11:56:36.723882 osd.17 [WRN] 5 slow requests, 1 included > > below; oldest blocked for > 34.606349 secs 2014-09-04 11:56:36.723887 > > osd.17 [WRN] slow request 30.603057 seconds old, received at 2014-09-04 > > 11:56:06.120787: osd_op(client.12483520.0:11473870 > > rbd_data.4b8e9b3d1b58ba.0000000000004600 [stat,write 2293760~4096] > > 3.49398689 e13901) v4 currently waiting for subops from [31,18] > > 2014-09-04 11:56:37.511787 osd.22 [WRN] 24 slow requests, 6 included > > below; oldest blocked for > 52.746984 secs 2014-09-04 11:56:37.511793 > > osd.22 [WRN] slow request 32.008986 seconds old, received at 2014-09-04 > > 11:56:05.502757: osd_op(client.9266625.0:135901517 > > rbd_data.42d6792eb141f2.0000000000017436 [stat,write 2695168~4096] > > 3.32067b97 e13901) v4 currently waiting for subops from [12,29] > > 2014-09-04 11:56:37.511798 osd.22 [WRN] slow request 31.749483 seconds > > old, received at 2014-09-04 11:56:05.762260: > > osd_op(client.9281282.0:55728036 > > rbd_data.2d99f62eb141f2.0000000000003753 [stat,write 1630208~8192] > > 3.6625df97 e13901) v4 currently waiting for subops from [12,29] > > 2014-09-04 11:56:37.511801 osd.22 [WRN] slow request 31.745558 seconds > > old, received at 2014-09-04 11:56:05.766185: > > osd_op(client.9281282.0:55728038 > > rbd_data.2d99f62eb141f2.0000000000003753 [stat,write 1941504~4096] > > 3.6625df97 e13901) v4 currently waiting for subops from [12,29] > > 2014-09-04 11:56:37.511803 osd.22 [WRN] slow request 31.390945 seconds > > old, received at 2014-09-04 11:56:06.120798: > > osd_op(client.12483520.0:11473871 > > rbd_data.4b8e9b3d1b58ba.0000000000005c00 [stat,write 2424832~8192] > > 3.47dbbb97 e13901) v4 currently waiting for subops from [12,29] > > 2014-09-04 11:56:37.511806 osd.22 [WRN] slow request 31.390357 seconds > > old, received at 2014-09-04 11:56:06.121386: > > osd_op(client.12483520.0:11473884 > > rbd_data.4b8e9b3d1b58ba.0000000000005c00 [stat,write 2457600~16384] > > 3.47dbbb97 e13901) v4 currently waiting for subops from [12,29] > > > > Kind Regards, > > > > David > > > > > > _______________________________________________ > > ceph-users mailing list > > ceph-users at lists.ceph.com > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > -- > Christian Balzer Network/Systems Engineer > chibi at gol.com Global OnLine Japan/Fusion Communications > http://www.gol.com/ > _______________________________________________ > ceph-users mailing list > ceph-users at lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140905/36ce38cd/attachment.htm>