On Thu, 4 Sep 2014 12:02:13 +0200 David wrote: > Hi, > > We?re running a ceph cluster with version: > > 0.67.7-1~bpo70+1 > > All of a sudden we?re having issues with the cluster (running RBD images > for kvm) with slow requests on all of the OSD servers. Any idea why and > how to fix it? > You give us a Ceph version at least, but for anybody to make guesses we need much more information than a log spew. How many nodes/OSDs, OS, hardware, OSD details (FS, journals on SSDs), etc. Run atop (in a sufficiently large terminal) on all your nodes, see if you can spot a bottleneck, like a disk being at 100% all the time with a much higher avio than the others. Looking at your logs, I'd pay particular attention to the disk holding osd.22. A single slow disk can bring a whole large cluster to a crawl. If you're using a hardware controller with a battery backed up cache, check if that is fine, loss of the battery would switch from writeback to writethrough and massively slow down IOPS. Regards, Christian > > 2014-09-04 11:56:35.868521 mon.0 [INF] pgmap v12504451: 6860 pgs: 6860 > active+clean; 12163 GB data, 36308 GB used, 142 TB / 178 TB avail; > 634KB/s rd, 487KB/s wr, 90op/s 2014-09-04 11:56:29.510270 osd.22 [WRN] > 15 slow requests, 1 included below; oldest blocked for > 44.745754 secs > 2014-09-04 11:56:29.510276 osd.22 [WRN] slow request 30.999821 seconds > old, received at 2014-09-04 11:55:58.510424: > osd_op(client.10731617.0:81868956 > rbd_data.967e022eb141f2.0000000000000e72 [write 0~4194304] 3.c585cebe > e13901) v4 currently waiting for subops from [37,13] 2014-09-04 > 11:56:30.510528 osd.22 [WRN] 21 slow requests, 6 included below; oldest > blocked for > 45.745989 secs 2014-09-04 11:56:30.510534 osd.22 [WRN] > slow request 30.122555 seconds old, received at 2014-09-04 > 11:56:00.387925: osd_op(client.13425082.0:11962345 > rbd_data.54f24c3d1b58ba.0000000000003753 [stat,write 1114112~8192] > 3.c9e49140 e13901) v4 currently waiting for subops from [13,42] > 2014-09-04 11:56:30.510537 osd.22 [WRN] slow request 30.122362 seconds > old, received at 2014-09-04 11:56:00.388118: > osd_op(client.13425082.0:11962352 > rbd_data.54f24c3d1b58ba.0000000000003753 [stat,write 1134592~4096] > 3.c9e49140 e13901) v4 currently waiting for subops from [13,42] > 2014-09-04 11:56:30.510541 osd.22 [WRN] slow request 30.122298 seconds > old, received at 2014-09-04 11:56:00.388182: > osd_op(client.13425082.0:11962353 > rbd_data.54f24c3d1b58ba.0000000000003753 [stat,write 4046848~8192] > 3.c9e49140 e13901) v4 currently waiting for subops from [13,42] > 2014-09-04 11:56:30.510544 osd.22 [WRN] slow request 30.121577 seconds > old, received at 2014-09-04 11:56:00.388903: > osd_op(client.13425082.0:11962374 > rbd_data.54f24c3d1b58ba.00000000000047f2 [stat,write 2527232~4096] > 3.cd9a9015 e13901) v4 currently waiting for subops from [45,1] > 2014-09-04 11:56:30.510548 osd.22 [WRN] slow request 30.121518 seconds > old, received at 2014-09-04 11:56:00.388962: > osd_op(client.13425082.0:11962375 > rbd_data.54f24c3d1b58ba.00000000000047f2 [stat,write 3133440~4096] > 3.cd9a9015 e13901) v4 currently waiting for subops from [45,1] > 2014-09-04 11:56:31.510706 osd.22 [WRN] 26 slow requests, 6 included > below; oldest blocked for > 46.746163 secs 2014-09-04 11:56:31.510711 > osd.22 [WRN] slow request 31.035418 seconds old, received at 2014-09-04 > 11:56:00.475236: osd_op(client.9266625.0:135900595 > rbd_data.42d6792eb141f2.000000000000bc00 [stat,write 2097152~4096] > 3.a2894ebe e13901) v4 currently waiting for subops from [37,13] > 2014-09-04 11:56:31.510715 osd.22 [WRN] slow request 31.035335 seconds > old, received at 2014-09-04 11:56:00.475319: > osd_op(client.9266625.0:135900596 > rbd_data.42d6792eb141f2.000000000000bc00 [stat,write 2162688~4096] > 3.a2894ebe e13901) v4 currently waiting for subops from [37,13] > 2014-09-04 11:56:31.510718 osd.22 [WRN] slow request 31.035270 seconds > old, received at 2014-09-04 11:56:00.475384: > osd_op(client.9266625.0:135900597 > rbd_data.42d6792eb141f2.000000000000bc00 [stat,write 2400256~16384] > 3.a2894ebe e13901) v4 currently waiting for subops from [37,13] > 2014-09-04 11:56:31.510721 osd.22 [WRN] slow request 31.035093 seconds > old, received at 2014-09-04 11:56:00.475561: > osd_op(client.9266625.0:135900598 > rbd_data.42d6792eb141f2.000000000000bc00 [stat,write 2420736~4096] > 3.a2894ebe e13901) v4 currently waiting for subops from [37,13] > 2014-09-04 11:56:31.510724 osd.22 [WRN] slow request 31.034990 seconds > old, received at 2014-09-04 11:56:00.475664: > osd_op(client.9266625.0:135900599 > rbd_data.42d6792eb141f2.000000000000bc00 [stat,write 2428928~12288] > 3.a2894ebe e13901) v4 currently waiting for subops from [37,13] > 2014-09-04 11:56:32.510864 osd.22 [WRN] 28 slow requests, 6 included > below; oldest blocked for > 47.746324 secs 2014-09-04 11:56:32.510871 > osd.22 [WRN] slow request 32.035026 seconds old, received at 2014-09-04 > 11:56:00.475789: osd_op(client.9266625.0:135900600 > rbd_data.42d6792eb141f2.000000000000bc00 [stat,write 2453504~4096] > 3.a2894ebe e13901) v4 currently waiting for subops from [37,13] > 2014-09-04 11:56:32.510876 osd.22 [WRN] slow request 32.034982 seconds > old, received at 2014-09-04 11:56:00.475833: > osd_op(client.9266625.0:135900601 > rbd_data.42d6792eb141f2.000000000000bc00 [stat,write 2502656~16384] > 3.a2894ebe e13901) v4 currently waiting for subops from [37,13] > 2014-09-04 11:56:32.510881 osd.22 [WRN] slow request 32.034917 seconds > old, received at 2014-09-04 11:56:00.475898: > osd_op(client.9266625.0:135900602 > rbd_data.42d6792eb141f2.000000000000bc00 [stat,write 2600960~4096] > 3.a2894ebe e13901) v4 currently waiting for subops from [37,13] > 2014-09-04 11:56:32.510885 osd.22 [WRN] slow request 32.034807 seconds > old, received at 2014-09-04 11:56:00.476008: > osd_op(client.9266625.0:135900603 > rbd_data.42d6792eb141f2.000000000000bc00 [stat,write 2617344~4096] > 3.a2894ebe e13901) v4 currently waiting for subops from [37,13] > 2014-09-04 11:56:32.510890 osd.22 [WRN] slow request 32.034756 seconds > old, received at 2014-09-04 11:56:00.476059: > osd_op(client.9266625.0:135900604 > rbd_data.42d6792eb141f2.000000000000bc00 [stat,write 2637824~69632] > 3.a2894ebe e13901) v4 currently waiting for subops from [37,13] > 2014-09-04 11:56:32.723329 osd.17 [WRN] 4 slow requests, 4 included > below; oldest blocked for > 30.605794 secs 2014-09-04 11:56:32.723335 > osd.17 [WRN] slow request 30.605794 seconds old, received at 2014-09-04 > 11:56:02.117495: osd_op(client.12646411.0:31118135 > rbd_data.81aaf52eb141f2.0000000000000e4b [stat,write 0~4194304] > 3.716b373b e13901) v4 currently waiting for subops from [31,1] > 2014-09-04 11:56:32.723338 osd.17 [WRN] slow request 30.362042 seconds > old, received at 2014-09-04 11:56:02.361247: > osd_op(client.12485793.0:121786489 > rbd_data.4ae87f2eb141f2.000000000000e200 [stat,write 2834432~4096] > 3.6b43de72 e13901) v4 currently waiting for subops from [5,31] > 2014-09-04 11:56:32.723341 osd.17 [WRN] slow request 30.179681 seconds > old, received at 2014-09-04 11:56:02.543608: > osd_op(client.13425079.0:7014393 > rbd_data.54f2522eb141f2.0000000000014d01 [stat,write 2502656~4096] > 3.d8ceea89 e13901) v4 currently waiting for subops from [31,18] > 2014-09-04 11:56:32.723344 osd.17 [WRN] slow request 30.000568 seconds > old, received at 2014-09-04 11:56:02.722721: > osd_op(client.9281279.0:144230325 > rbd_data.2d99fc2eb141f2.000000000000e80a [stat,write 1589248~4096] > 3.16dc0e89 e13901) v4 currently waiting for subops from [31,18] > 2014-09-04 11:56:33.511043 osd.22 [WRN] 33 slow requests, 6 included > below; oldest blocked for > 48.746486 secs 2014-09-04 11:56:33.511049 > osd.22 [WRN] slow request 33.034756 seconds old, received at 2014-09-04 > 11:56:00.476221: osd_op(client.9266625.0:135900605 > rbd_data.42d6792eb141f2.000000000000bc00 [stat,write 2715648~8192] > 3.a2894ebe e13901) v4 currently waiting for subops from [37,13] > 2014-09-04 11:56:33.511053 osd.22 [WRN] slow request 33.034620 seconds > old, received at 2014-09-04 11:56:00.476357: > osd_op(client.9266625.0:135900606 > rbd_data.42d6792eb141f2.000000000000bc00 [stat,write 2756608~4096] > 3.a2894ebe e13901) v4 currently waiting for subops from [37,13] > 2014-09-04 11:56:33.511056 osd.22 [WRN] slow request 33.034446 seconds > old, received at 2014-09-04 11:56:00.476531: > osd_op(client.9266625.0:135900607 > rbd_data.42d6792eb141f2.000000000000bc00 [stat,write 2809856~4096] > 3.a2894ebe e13901) v4 currently waiting for subops from [37,13] > 2014-09-04 11:56:33.511060 osd.22 [WRN] slow request 33.034387 seconds > old, received at 2014-09-04 11:56:00.476590: > osd_op(client.9266625.0:135900608 > rbd_data.42d6792eb141f2.000000000000bc00 [stat,write 2883584~4096] > 3.a2894ebe e13901) v4 currently waiting for subops from [37,13] > 2014-09-04 11:56:33.511063 osd.22 [WRN] slow request 33.034341 seconds > old, received at 2014-09-04 11:56:00.476636: > osd_op(client.9266625.0:135900609 > rbd_data.42d6792eb141f2.000000000000bc00 [stat,write 2891776~8192] > 3.a2894ebe e13901) v4 currently waiting for subops from [37,13] > 2014-09-04 11:56:34.511206 osd.22 [WRN] 38 slow requests, 6 included > below; oldest blocked for > 49.746664 secs 2014-09-04 11:56:34.511215 > osd.22 [WRN] slow request 34.034426 seconds old, received at 2014-09-04 > 11:56:00.476729: osd_op(client.9266625.0:135900610 > rbd_data.42d6792eb141f2.000000000000bc00 [stat,write 2908160~4096] > 3.a2894ebe e13901) v4 currently waiting for subops from [37,13] > 2014-09-04 11:56:34.511219 osd.22 [WRN] slow request 34.034298 seconds > old, received at 2014-09-04 11:56:00.476857: > osd_op(client.9266625.0:135900611 > rbd_data.42d6792eb141f2.000000000000bc00 [stat,write 2932736~28672] > 3.a2894ebe e13901) v4 currently waiting for subops from [37,13] > 2014-09-04 11:56:34.511222 osd.22 [WRN] slow request 34.034234 seconds > old, received at 2014-09-04 11:56:00.476921: > osd_op(client.9266625.0:135900612 > rbd_data.42d6792eb141f2.000000000000bc00 [stat,write 2969600~8192] > 3.a2894ebe e13901) v4 currently waiting for subops from [37,13] > 2014-09-04 11:56:34.511225 osd.22 [WRN] slow request 34.034147 seconds > old, received at 2014-09-04 11:56:00.477008: > osd_op(client.9266625.0:135900613 > rbd_data.42d6792eb141f2.000000000000bc00 [stat,write 2981888~4096] > 3.a2894ebe e13901) v4 currently waiting for subops from [37,13] > 2014-09-04 11:56:34.511228 osd.22 [WRN] slow request 34.034055 seconds > old, received at 2014-09-04 11:56:00.477100: > osd_op(client.9266625.0:135900614 > rbd_data.42d6792eb141f2.000000000000bc00 [stat,write 2998272~12288] > 3.a2894ebe e13901) v4 currently waiting for subops from [37,13] > 2014-09-04 11:56:35.153813 osd.37 [WRN] 1 slow requests, 1 included > below; oldest blocked for > 30.307864 secs 2014-09-04 11:56:35.153820 > osd.37 [WRN] slow request 30.307864 seconds old, received at 2014-09-04 > 11:56:04.845917: osd_op(client.10785413.0:129409910 > rbd_data.96803f2eb141f2.00000000000042b0 [stat,write 1712128~4096] > 3.6c5ac950 e13901) v4 currently waiting for subops from [22,41] > 2014-09-04 11:56:35.511450 osd.22 [WRN] 42 slow requests, 6 included > below; oldest blocked for > 50.746638 secs 2014-09-04 11:56:35.511455 > osd.22 [WRN] slow request 35.034219 seconds old, received at 2014-09-04 > 11:56:00.477178: osd_op(client.9266625.0:135900615 > rbd_data.42d6792eb141f2.000000000000bc00 [stat,write 3022848~8192] > 3.a2894ebe e13901) v4 currently waiting for subops from [37,13] > 2014-09-04 11:56:35.511459 osd.22 [WRN] slow request 35.034143 seconds > old, received at 2014-09-04 11:56:00.477254: > osd_op(client.9266625.0:135900616 > rbd_data.42d6792eb141f2.000000000000bc00 [stat,write 3035136~12288] > 3.a2894ebe e13901) v4 currently waiting for subops from [37,13] > 2014-09-04 11:56:35.511462 osd.22 [WRN] slow request 35.034082 seconds > old, received at 2014-09-04 11:56:00.477315: > osd_op(client.9266625.0:135900617 > rbd_data.42d6792eb141f2.000000000000bc00 [stat,write 3055616~4096] > 3.a2894ebe e13901) v4 currently waiting for subops from [37,13] > 2014-09-04 11:56:35.511465 osd.22 [WRN] slow request 35.033965 seconds > old, received at 2014-09-04 11:56:00.477432: > osd_op(client.9266625.0:135900618 > rbd_data.42d6792eb141f2.000000000000bc00 [stat,write 3063808~4096] > 3.a2894ebe e13901) v4 currently waiting for subops from [37,13] > 2014-09-04 11:56:35.511469 osd.22 [WRN] slow request 34.162275 seconds > old, received at 2014-09-04 11:56:01.349122: > osd_op(client.12485646.0:110385143 > rbd_data.4ccab12eb141f2.0000000000015b66 [stat,write 909312~4096] > 3.9c416abe e13901) v4 currently waiting for subops from [37,13] > 2014-09-04 11:56:36.511603 osd.22 [WRN] 34 slow requests, 6 included > below; oldest blocked for > 51.746797 secs 2014-09-04 11:56:36.511609 > osd.22 [WRN] slow request 34.401476 seconds old, received at 2014-09-04 > 11:56:02.110080: osd_op(client.12485643.0:117955489 > rbd_data.4aad332eb141f2.0000000000002c00 [stat,write 2531328~4096] > 3.b985ec92 e13901) v4 currently waiting for subops from [33,1] > 2014-09-04 11:56:36.511612 osd.22 [WRN] slow request 34.149511 seconds > old, received at 2014-09-04 11:56:02.362045: > osd_op(client.12485793.0:121786505 > rbd_data.4ae87f2eb141f2.0000000000018a0a [stat,write 3342336~4096] > 3.26907492 e13901) v4 currently waiting for subops from [33,1] > 2014-09-04 11:56:36.511615 osd.22 [WRN] slow request 34.329528 seconds > old, received at 2014-09-04 11:56:02.182028: > osd_op(client.12646411.0:31118266 > rbd_data.81aaf52eb141f2.0000000000000e4f [stat,write 0~4194304] > 3.168c978d e13901) v4 currently waiting for subops from [49,29] > 2014-09-04 11:56:36.511618 osd.22 [WRN] slow request 31.871976 seconds > old, received at 2014-09-04 11:56:04.639580: > osd_op(client.12485793.0:121786506 > rbd_data.4ae87f2eb141f2.0000000000018a0a [stat,write 0~4096] 3.26907492 > e13901) v4 currently waiting for subops from [33,1] 2014-09-04 > 11:56:36.511621 osd.22 [WRN] slow request 31.008867 seconds old, > received at 2014-09-04 11:56:05.502689: > osd_op(client.9266625.0:135901516 > rbd_data.42d6792eb141f2.0000000000017436 [stat,write 2682880~4096] > 3.32067b97 e13901) v4 currently waiting for subops from [12,29] > 2014-09-04 11:56:36.723882 osd.17 [WRN] 5 slow requests, 1 included > below; oldest blocked for > 34.606349 secs 2014-09-04 11:56:36.723887 > osd.17 [WRN] slow request 30.603057 seconds old, received at 2014-09-04 > 11:56:06.120787: osd_op(client.12483520.0:11473870 > rbd_data.4b8e9b3d1b58ba.0000000000004600 [stat,write 2293760~4096] > 3.49398689 e13901) v4 currently waiting for subops from [31,18] > 2014-09-04 11:56:37.511787 osd.22 [WRN] 24 slow requests, 6 included > below; oldest blocked for > 52.746984 secs 2014-09-04 11:56:37.511793 > osd.22 [WRN] slow request 32.008986 seconds old, received at 2014-09-04 > 11:56:05.502757: osd_op(client.9266625.0:135901517 > rbd_data.42d6792eb141f2.0000000000017436 [stat,write 2695168~4096] > 3.32067b97 e13901) v4 currently waiting for subops from [12,29] > 2014-09-04 11:56:37.511798 osd.22 [WRN] slow request 31.749483 seconds > old, received at 2014-09-04 11:56:05.762260: > osd_op(client.9281282.0:55728036 > rbd_data.2d99f62eb141f2.0000000000003753 [stat,write 1630208~8192] > 3.6625df97 e13901) v4 currently waiting for subops from [12,29] > 2014-09-04 11:56:37.511801 osd.22 [WRN] slow request 31.745558 seconds > old, received at 2014-09-04 11:56:05.766185: > osd_op(client.9281282.0:55728038 > rbd_data.2d99f62eb141f2.0000000000003753 [stat,write 1941504~4096] > 3.6625df97 e13901) v4 currently waiting for subops from [12,29] > 2014-09-04 11:56:37.511803 osd.22 [WRN] slow request 31.390945 seconds > old, received at 2014-09-04 11:56:06.120798: > osd_op(client.12483520.0:11473871 > rbd_data.4b8e9b3d1b58ba.0000000000005c00 [stat,write 2424832~8192] > 3.47dbbb97 e13901) v4 currently waiting for subops from [12,29] > 2014-09-04 11:56:37.511806 osd.22 [WRN] slow request 31.390357 seconds > old, received at 2014-09-04 11:56:06.121386: > osd_op(client.12483520.0:11473884 > rbd_data.4b8e9b3d1b58ba.0000000000005c00 [stat,write 2457600~16384] > 3.47dbbb97 e13901) v4 currently waiting for subops from [12,29] > > Kind Regards, > > David > > > _______________________________________________ > ceph-users mailing list > ceph-users at lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -- Christian Balzer Network/Systems Engineer chibi at gol.com Global OnLine Japan/Fusion Communications http://www.gol.com/