Huge issues with slow requests

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 4 Sep 2014 12:02:13 +0200 David wrote:

> Hi,
> 
> We?re running a ceph cluster with version:
> 
> 0.67.7-1~bpo70+1
> 
> All of a sudden we?re having issues with the cluster (running RBD images
> for kvm) with slow requests on all of the OSD servers. Any idea why and
> how to fix it?
> 
You give us a Ceph version at least, but for anybody to make guesses we
need much more information than a log spew.

How many nodes/OSDs, OS, hardware, OSD details (FS, journals on SSDs), etc.

Run atop (in a sufficiently large terminal) on all your nodes, see if you
can spot a bottleneck, like a disk being at 100% all the time with a
much higher avio than the others.
Looking at your logs, I'd pay particular attention to the disk holding
osd.22. 
A single slow disk can bring a whole large cluster to a crawl.
If you're using a hardware controller with a battery backed up cache,
check if that is fine, loss of the battery would switch from writeback to
writethrough and massively slow down IOPS.

Regards,

Christian
> 
> 2014-09-04 11:56:35.868521 mon.0 [INF] pgmap v12504451: 6860 pgs: 6860
> active+clean; 12163 GB data, 36308 GB used, 142 TB / 178 TB avail;
> 634KB/s rd, 487KB/s wr, 90op/s 2014-09-04 11:56:29.510270 osd.22 [WRN]
> 15 slow requests, 1 included below; oldest blocked for > 44.745754 secs
> 2014-09-04 11:56:29.510276 osd.22 [WRN] slow request 30.999821 seconds
> old, received at 2014-09-04 11:55:58.510424:
> osd_op(client.10731617.0:81868956
> rbd_data.967e022eb141f2.0000000000000e72 [write 0~4194304] 3.c585cebe
> e13901) v4 currently waiting for subops from [37,13] 2014-09-04
> 11:56:30.510528 osd.22 [WRN] 21 slow requests, 6 included below; oldest
> blocked for > 45.745989 secs 2014-09-04 11:56:30.510534 osd.22 [WRN]
> slow request 30.122555 seconds old, received at 2014-09-04
> 11:56:00.387925: osd_op(client.13425082.0:11962345
> rbd_data.54f24c3d1b58ba.0000000000003753 [stat,write 1114112~8192]
> 3.c9e49140 e13901) v4 currently waiting for subops from [13,42]
> 2014-09-04 11:56:30.510537 osd.22 [WRN] slow request 30.122362 seconds
> old, received at 2014-09-04 11:56:00.388118:
> osd_op(client.13425082.0:11962352
> rbd_data.54f24c3d1b58ba.0000000000003753 [stat,write 1134592~4096]
> 3.c9e49140 e13901) v4 currently waiting for subops from [13,42]
> 2014-09-04 11:56:30.510541 osd.22 [WRN] slow request 30.122298 seconds
> old, received at 2014-09-04 11:56:00.388182:
> osd_op(client.13425082.0:11962353
> rbd_data.54f24c3d1b58ba.0000000000003753 [stat,write 4046848~8192]
> 3.c9e49140 e13901) v4 currently waiting for subops from [13,42]
> 2014-09-04 11:56:30.510544 osd.22 [WRN] slow request 30.121577 seconds
> old, received at 2014-09-04 11:56:00.388903:
> osd_op(client.13425082.0:11962374
> rbd_data.54f24c3d1b58ba.00000000000047f2 [stat,write 2527232~4096]
> 3.cd9a9015 e13901) v4 currently waiting for subops from [45,1]
> 2014-09-04 11:56:30.510548 osd.22 [WRN] slow request 30.121518 seconds
> old, received at 2014-09-04 11:56:00.388962:
> osd_op(client.13425082.0:11962375
> rbd_data.54f24c3d1b58ba.00000000000047f2 [stat,write 3133440~4096]
> 3.cd9a9015 e13901) v4 currently waiting for subops from [45,1]
> 2014-09-04 11:56:31.510706 osd.22 [WRN] 26 slow requests, 6 included
> below; oldest blocked for > 46.746163 secs 2014-09-04 11:56:31.510711
> osd.22 [WRN] slow request 31.035418 seconds old, received at 2014-09-04
> 11:56:00.475236: osd_op(client.9266625.0:135900595
> rbd_data.42d6792eb141f2.000000000000bc00 [stat,write 2097152~4096]
> 3.a2894ebe e13901) v4 currently waiting for subops from [37,13]
> 2014-09-04 11:56:31.510715 osd.22 [WRN] slow request 31.035335 seconds
> old, received at 2014-09-04 11:56:00.475319:
> osd_op(client.9266625.0:135900596
> rbd_data.42d6792eb141f2.000000000000bc00 [stat,write 2162688~4096]
> 3.a2894ebe e13901) v4 currently waiting for subops from [37,13]
> 2014-09-04 11:56:31.510718 osd.22 [WRN] slow request 31.035270 seconds
> old, received at 2014-09-04 11:56:00.475384:
> osd_op(client.9266625.0:135900597
> rbd_data.42d6792eb141f2.000000000000bc00 [stat,write 2400256~16384]
> 3.a2894ebe e13901) v4 currently waiting for subops from [37,13]
> 2014-09-04 11:56:31.510721 osd.22 [WRN] slow request 31.035093 seconds
> old, received at 2014-09-04 11:56:00.475561:
> osd_op(client.9266625.0:135900598
> rbd_data.42d6792eb141f2.000000000000bc00 [stat,write 2420736~4096]
> 3.a2894ebe e13901) v4 currently waiting for subops from [37,13]
> 2014-09-04 11:56:31.510724 osd.22 [WRN] slow request 31.034990 seconds
> old, received at 2014-09-04 11:56:00.475664:
> osd_op(client.9266625.0:135900599
> rbd_data.42d6792eb141f2.000000000000bc00 [stat,write 2428928~12288]
> 3.a2894ebe e13901) v4 currently waiting for subops from [37,13]
> 2014-09-04 11:56:32.510864 osd.22 [WRN] 28 slow requests, 6 included
> below; oldest blocked for > 47.746324 secs 2014-09-04 11:56:32.510871
> osd.22 [WRN] slow request 32.035026 seconds old, received at 2014-09-04
> 11:56:00.475789: osd_op(client.9266625.0:135900600
> rbd_data.42d6792eb141f2.000000000000bc00 [stat,write 2453504~4096]
> 3.a2894ebe e13901) v4 currently waiting for subops from [37,13]
> 2014-09-04 11:56:32.510876 osd.22 [WRN] slow request 32.034982 seconds
> old, received at 2014-09-04 11:56:00.475833:
> osd_op(client.9266625.0:135900601
> rbd_data.42d6792eb141f2.000000000000bc00 [stat,write 2502656~16384]
> 3.a2894ebe e13901) v4 currently waiting for subops from [37,13]
> 2014-09-04 11:56:32.510881 osd.22 [WRN] slow request 32.034917 seconds
> old, received at 2014-09-04 11:56:00.475898:
> osd_op(client.9266625.0:135900602
> rbd_data.42d6792eb141f2.000000000000bc00 [stat,write 2600960~4096]
> 3.a2894ebe e13901) v4 currently waiting for subops from [37,13]
> 2014-09-04 11:56:32.510885 osd.22 [WRN] slow request 32.034807 seconds
> old, received at 2014-09-04 11:56:00.476008:
> osd_op(client.9266625.0:135900603
> rbd_data.42d6792eb141f2.000000000000bc00 [stat,write 2617344~4096]
> 3.a2894ebe e13901) v4 currently waiting for subops from [37,13]
> 2014-09-04 11:56:32.510890 osd.22 [WRN] slow request 32.034756 seconds
> old, received at 2014-09-04 11:56:00.476059:
> osd_op(client.9266625.0:135900604
> rbd_data.42d6792eb141f2.000000000000bc00 [stat,write 2637824~69632]
> 3.a2894ebe e13901) v4 currently waiting for subops from [37,13]
> 2014-09-04 11:56:32.723329 osd.17 [WRN] 4 slow requests, 4 included
> below; oldest blocked for > 30.605794 secs 2014-09-04 11:56:32.723335
> osd.17 [WRN] slow request 30.605794 seconds old, received at 2014-09-04
> 11:56:02.117495: osd_op(client.12646411.0:31118135
> rbd_data.81aaf52eb141f2.0000000000000e4b [stat,write 0~4194304]
> 3.716b373b e13901) v4 currently waiting for subops from [31,1]
> 2014-09-04 11:56:32.723338 osd.17 [WRN] slow request 30.362042 seconds
> old, received at 2014-09-04 11:56:02.361247:
> osd_op(client.12485793.0:121786489
> rbd_data.4ae87f2eb141f2.000000000000e200 [stat,write 2834432~4096]
> 3.6b43de72 e13901) v4 currently waiting for subops from [5,31]
> 2014-09-04 11:56:32.723341 osd.17 [WRN] slow request 30.179681 seconds
> old, received at 2014-09-04 11:56:02.543608:
> osd_op(client.13425079.0:7014393
> rbd_data.54f2522eb141f2.0000000000014d01 [stat,write 2502656~4096]
> 3.d8ceea89 e13901) v4 currently waiting for subops from [31,18]
> 2014-09-04 11:56:32.723344 osd.17 [WRN] slow request 30.000568 seconds
> old, received at 2014-09-04 11:56:02.722721:
> osd_op(client.9281279.0:144230325
> rbd_data.2d99fc2eb141f2.000000000000e80a [stat,write 1589248~4096]
> 3.16dc0e89 e13901) v4 currently waiting for subops from [31,18]
> 2014-09-04 11:56:33.511043 osd.22 [WRN] 33 slow requests, 6 included
> below; oldest blocked for > 48.746486 secs 2014-09-04 11:56:33.511049
> osd.22 [WRN] slow request 33.034756 seconds old, received at 2014-09-04
> 11:56:00.476221: osd_op(client.9266625.0:135900605
> rbd_data.42d6792eb141f2.000000000000bc00 [stat,write 2715648~8192]
> 3.a2894ebe e13901) v4 currently waiting for subops from [37,13]
> 2014-09-04 11:56:33.511053 osd.22 [WRN] slow request 33.034620 seconds
> old, received at 2014-09-04 11:56:00.476357:
> osd_op(client.9266625.0:135900606
> rbd_data.42d6792eb141f2.000000000000bc00 [stat,write 2756608~4096]
> 3.a2894ebe e13901) v4 currently waiting for subops from [37,13]
> 2014-09-04 11:56:33.511056 osd.22 [WRN] slow request 33.034446 seconds
> old, received at 2014-09-04 11:56:00.476531:
> osd_op(client.9266625.0:135900607
> rbd_data.42d6792eb141f2.000000000000bc00 [stat,write 2809856~4096]
> 3.a2894ebe e13901) v4 currently waiting for subops from [37,13]
> 2014-09-04 11:56:33.511060 osd.22 [WRN] slow request 33.034387 seconds
> old, received at 2014-09-04 11:56:00.476590:
> osd_op(client.9266625.0:135900608
> rbd_data.42d6792eb141f2.000000000000bc00 [stat,write 2883584~4096]
> 3.a2894ebe e13901) v4 currently waiting for subops from [37,13]
> 2014-09-04 11:56:33.511063 osd.22 [WRN] slow request 33.034341 seconds
> old, received at 2014-09-04 11:56:00.476636:
> osd_op(client.9266625.0:135900609
> rbd_data.42d6792eb141f2.000000000000bc00 [stat,write 2891776~8192]
> 3.a2894ebe e13901) v4 currently waiting for subops from [37,13]
> 2014-09-04 11:56:34.511206 osd.22 [WRN] 38 slow requests, 6 included
> below; oldest blocked for > 49.746664 secs 2014-09-04 11:56:34.511215
> osd.22 [WRN] slow request 34.034426 seconds old, received at 2014-09-04
> 11:56:00.476729: osd_op(client.9266625.0:135900610
> rbd_data.42d6792eb141f2.000000000000bc00 [stat,write 2908160~4096]
> 3.a2894ebe e13901) v4 currently waiting for subops from [37,13]
> 2014-09-04 11:56:34.511219 osd.22 [WRN] slow request 34.034298 seconds
> old, received at 2014-09-04 11:56:00.476857:
> osd_op(client.9266625.0:135900611
> rbd_data.42d6792eb141f2.000000000000bc00 [stat,write 2932736~28672]
> 3.a2894ebe e13901) v4 currently waiting for subops from [37,13]
> 2014-09-04 11:56:34.511222 osd.22 [WRN] slow request 34.034234 seconds
> old, received at 2014-09-04 11:56:00.476921:
> osd_op(client.9266625.0:135900612
> rbd_data.42d6792eb141f2.000000000000bc00 [stat,write 2969600~8192]
> 3.a2894ebe e13901) v4 currently waiting for subops from [37,13]
> 2014-09-04 11:56:34.511225 osd.22 [WRN] slow request 34.034147 seconds
> old, received at 2014-09-04 11:56:00.477008:
> osd_op(client.9266625.0:135900613
> rbd_data.42d6792eb141f2.000000000000bc00 [stat,write 2981888~4096]
> 3.a2894ebe e13901) v4 currently waiting for subops from [37,13]
> 2014-09-04 11:56:34.511228 osd.22 [WRN] slow request 34.034055 seconds
> old, received at 2014-09-04 11:56:00.477100:
> osd_op(client.9266625.0:135900614
> rbd_data.42d6792eb141f2.000000000000bc00 [stat,write 2998272~12288]
> 3.a2894ebe e13901) v4 currently waiting for subops from [37,13]
> 2014-09-04 11:56:35.153813 osd.37 [WRN] 1 slow requests, 1 included
> below; oldest blocked for > 30.307864 secs 2014-09-04 11:56:35.153820
> osd.37 [WRN] slow request 30.307864 seconds old, received at 2014-09-04
> 11:56:04.845917: osd_op(client.10785413.0:129409910
> rbd_data.96803f2eb141f2.00000000000042b0 [stat,write 1712128~4096]
> 3.6c5ac950 e13901) v4 currently waiting for subops from [22,41]
> 2014-09-04 11:56:35.511450 osd.22 [WRN] 42 slow requests, 6 included
> below; oldest blocked for > 50.746638 secs 2014-09-04 11:56:35.511455
> osd.22 [WRN] slow request 35.034219 seconds old, received at 2014-09-04
> 11:56:00.477178: osd_op(client.9266625.0:135900615
> rbd_data.42d6792eb141f2.000000000000bc00 [stat,write 3022848~8192]
> 3.a2894ebe e13901) v4 currently waiting for subops from [37,13]
> 2014-09-04 11:56:35.511459 osd.22 [WRN] slow request 35.034143 seconds
> old, received at 2014-09-04 11:56:00.477254:
> osd_op(client.9266625.0:135900616
> rbd_data.42d6792eb141f2.000000000000bc00 [stat,write 3035136~12288]
> 3.a2894ebe e13901) v4 currently waiting for subops from [37,13]
> 2014-09-04 11:56:35.511462 osd.22 [WRN] slow request 35.034082 seconds
> old, received at 2014-09-04 11:56:00.477315:
> osd_op(client.9266625.0:135900617
> rbd_data.42d6792eb141f2.000000000000bc00 [stat,write 3055616~4096]
> 3.a2894ebe e13901) v4 currently waiting for subops from [37,13]
> 2014-09-04 11:56:35.511465 osd.22 [WRN] slow request 35.033965 seconds
> old, received at 2014-09-04 11:56:00.477432:
> osd_op(client.9266625.0:135900618
> rbd_data.42d6792eb141f2.000000000000bc00 [stat,write 3063808~4096]
> 3.a2894ebe e13901) v4 currently waiting for subops from [37,13]
> 2014-09-04 11:56:35.511469 osd.22 [WRN] slow request 34.162275 seconds
> old, received at 2014-09-04 11:56:01.349122:
> osd_op(client.12485646.0:110385143
> rbd_data.4ccab12eb141f2.0000000000015b66 [stat,write 909312~4096]
> 3.9c416abe e13901) v4 currently waiting for subops from [37,13]
> 2014-09-04 11:56:36.511603 osd.22 [WRN] 34 slow requests, 6 included
> below; oldest blocked for > 51.746797 secs 2014-09-04 11:56:36.511609
> osd.22 [WRN] slow request 34.401476 seconds old, received at 2014-09-04
> 11:56:02.110080: osd_op(client.12485643.0:117955489
> rbd_data.4aad332eb141f2.0000000000002c00 [stat,write 2531328~4096]
> 3.b985ec92 e13901) v4 currently waiting for subops from [33,1]
> 2014-09-04 11:56:36.511612 osd.22 [WRN] slow request 34.149511 seconds
> old, received at 2014-09-04 11:56:02.362045:
> osd_op(client.12485793.0:121786505
> rbd_data.4ae87f2eb141f2.0000000000018a0a [stat,write 3342336~4096]
> 3.26907492 e13901) v4 currently waiting for subops from [33,1]
> 2014-09-04 11:56:36.511615 osd.22 [WRN] slow request 34.329528 seconds
> old, received at 2014-09-04 11:56:02.182028:
> osd_op(client.12646411.0:31118266
> rbd_data.81aaf52eb141f2.0000000000000e4f [stat,write 0~4194304]
> 3.168c978d e13901) v4 currently waiting for subops from [49,29]
> 2014-09-04 11:56:36.511618 osd.22 [WRN] slow request 31.871976 seconds
> old, received at 2014-09-04 11:56:04.639580:
> osd_op(client.12485793.0:121786506
> rbd_data.4ae87f2eb141f2.0000000000018a0a [stat,write 0~4096] 3.26907492
> e13901) v4 currently waiting for subops from [33,1] 2014-09-04
> 11:56:36.511621 osd.22 [WRN] slow request 31.008867 seconds old,
> received at 2014-09-04 11:56:05.502689:
> osd_op(client.9266625.0:135901516
> rbd_data.42d6792eb141f2.0000000000017436 [stat,write 2682880~4096]
> 3.32067b97 e13901) v4 currently waiting for subops from [12,29]
> 2014-09-04 11:56:36.723882 osd.17 [WRN] 5 slow requests, 1 included
> below; oldest blocked for > 34.606349 secs 2014-09-04 11:56:36.723887
> osd.17 [WRN] slow request 30.603057 seconds old, received at 2014-09-04
> 11:56:06.120787: osd_op(client.12483520.0:11473870
> rbd_data.4b8e9b3d1b58ba.0000000000004600 [stat,write 2293760~4096]
> 3.49398689 e13901) v4 currently waiting for subops from [31,18]
> 2014-09-04 11:56:37.511787 osd.22 [WRN] 24 slow requests, 6 included
> below; oldest blocked for > 52.746984 secs 2014-09-04 11:56:37.511793
> osd.22 [WRN] slow request 32.008986 seconds old, received at 2014-09-04
> 11:56:05.502757: osd_op(client.9266625.0:135901517
> rbd_data.42d6792eb141f2.0000000000017436 [stat,write 2695168~4096]
> 3.32067b97 e13901) v4 currently waiting for subops from [12,29]
> 2014-09-04 11:56:37.511798 osd.22 [WRN] slow request 31.749483 seconds
> old, received at 2014-09-04 11:56:05.762260:
> osd_op(client.9281282.0:55728036
> rbd_data.2d99f62eb141f2.0000000000003753 [stat,write 1630208~8192]
> 3.6625df97 e13901) v4 currently waiting for subops from [12,29]
> 2014-09-04 11:56:37.511801 osd.22 [WRN] slow request 31.745558 seconds
> old, received at 2014-09-04 11:56:05.766185:
> osd_op(client.9281282.0:55728038
> rbd_data.2d99f62eb141f2.0000000000003753 [stat,write 1941504~4096]
> 3.6625df97 e13901) v4 currently waiting for subops from [12,29]
> 2014-09-04 11:56:37.511803 osd.22 [WRN] slow request 31.390945 seconds
> old, received at 2014-09-04 11:56:06.120798:
> osd_op(client.12483520.0:11473871
> rbd_data.4b8e9b3d1b58ba.0000000000005c00 [stat,write 2424832~8192]
> 3.47dbbb97 e13901) v4 currently waiting for subops from [12,29]
> 2014-09-04 11:56:37.511806 osd.22 [WRN] slow request 31.390357 seconds
> old, received at 2014-09-04 11:56:06.121386:
> osd_op(client.12483520.0:11473884
> rbd_data.4b8e9b3d1b58ba.0000000000005c00 [stat,write 2457600~16384]
> 3.47dbbb97 e13901) v4 currently waiting for subops from [12,29]
> 
> Kind Regards,
> 
> David
> 
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users at lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 


-- 
Christian Balzer        Network/Systems Engineer                
chibi at gol.com   	Global OnLine Japan/Fusion Communications
http://www.gol.com/


[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux