ceph freezes for 10+ seconds during benchmark

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



We've installed ceph on test cluster:
3x mon, 7xOSD on 2x10k RPM SAS
Centos 6.4 ( 2.6.32-358.14.1.el6.x86_64  )
ceph 0.67.2 (also tried with 0.61.7 with same results)

And during rados bench I get very strange behaviour:
# rados bench -p pbench 100 write 

   sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg lat
...
    51      16      1503      1487   116.603        72  0.306585  0.524611
    52      16      1525      1509   116.053        88  0.171904  0.520352
    53      16      1541      1525    115.07        64  0.121784  0.516466
    54      16      1541      1525   112.939         0         -  0.516466
    55      16      1541      1525   110.885         0         -  0.516466
    56      16      1541      1525   108.905         0         -  0.516466
    57      16      1541      1525   106.994         0         -  0.516466
... ( http://pastebin.com/vV50YBVK )

Bandwidth (MB/sec):     81.760
 
Stddev Bandwidth:       53.8371
Max bandwidth (MB/sec): 156
Min bandwidth (MB/sec): 0
Average Latency:        0.782271
Stddev Latency:         2.51829
Max latency:            26.1715
Min latency:            0.084654

basically benchmark goes at full disk speed and then it stops any I/O for 10+ seconds

During that time all IO and cpu load on all nodes basically stops and ceph -w starts to report:

2013-09-02 16:44:57.794115 osd.4 [WRN] 6 slow requests, 1 included below; oldest blocked for > 62.953663 secs
2013-09-02 16:44:57.794125 osd.4 [WRN] slow request 60.363101 seconds old, received at 2013-09-02 16:43:57.430961: osd_op(client.381797.0:2109 benchmark_data_hqblade203.non.3dart.com_18829_object2108 [write 0~4194304] 14.745012c3 e277) v4 currently waiting for subops from [0]
2013-09-02 16:45:01.795211 osd.4 [WRN] 6 slow requests, 1 included below; oldest blocked for > 66.954773 secs
2013-09-02 16:45:01.795221 osd.4 [WRN] slow request 60.661060 seconds old, received at 2013-09-02 16:44:01.134112: osd_op(client.381797.0:2199 benchmark_data_hqblade203.non.3dart.com_18829_object2198 [write 0~4194304] 14.dec41e60 e277) v4 currently waiting for subops from [0]
2013-09-02 16:45:02.795582 osd.4 [WRN] 6 slow requests, 2 included below; oldest blocked for > 67.955102 secs
2013-09-02 16:45:02.795590 osd.4 [WRN] slow request 60.316291 seconds old, received at 2013-09-02 16:44:02.479210: osd_op(client.381797.0:2230 benchmark_data_hqblade203.non.3dart.com_18829_object2229 [write 0~4194304] 14.b3ca5505 e277) v4 currently waiting for subops from [0]
2013-09-02 16:45:02.795595 osd.4 [WRN] slow request 60.014792 seconds old, received at 2013-09-02 16:44:02.780709: osd_op(client.381797.0:2234 benchmark_data_hqblade203.non.3dart.com_18829_object2233 [write 0~4194304] 14.a8c8cfd5 e277) v4 currently waiting for subops from [0]
2013-09-02 16:45:03.723742 osd.0 [WRN] 10 slow requests, 1 included below; oldest blocked for > 69.571037 secs
2013-09-02 16:45:03.723748 osd.0 [WRN] slow request 60.871583 seconds
old, received at 2013-09-02 16:44:02.852110:
osd_op(client.381797.0:2235
benchmark_data_hqblade203.non.3dart.com_18829_object2234 [write
0~4194304] 14.d44b2ab6 e277) v4 currently waiting for subops from [4]

Any ideas why it is happening and how it can be debugged ? it seems that there is something wrong with osd.0 but there doesnt seem to be anything wrong with machine itself (bonnie++ and dd on machine does not show up any lockups)

-- 
Mariusz Gronczewski, Administrator

Efigence Sp. z o. o.
ul. Wołoska 9a, 02-583 Warszawa
T: [+48] 22 380 13 13
F: [+48] 22 380 13 14
E: mariusz.gronczewski@xxxxxxxxxxxx
<mailto:mariusz.gronczewski@xxxxxxxxxxxx>

Attachment: signature.asc
Description: PGP signature

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux