Re: ceph freezes for 10+ seconds during benchmark

Samuel Just <sam.just@xxxxxxxxxxx> · Mon, 9 Sep 2013 13:48:20 -0700



It looks like osd.4 may actually be the problem.  Can you try removing
osd.4 and trying again?
-Sam

On Mon, Sep 2, 2013 at 8:01 AM, Mariusz Gronczewski
<mariusz.gronczewski@xxxxxxxxxxxxx> wrote:
> We've installed ceph on test cluster:
> 3x mon, 7xOSD on 2x10k RPM SAS
> Centos 6.4 ( 2.6.32-358.14.1.el6.x86_64  )
> ceph 0.67.2 (also tried with 0.61.7 with same results)
>
> And during rados bench I get very strange behaviour:
> # rados bench -p pbench 100 write
>
>    sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg lat
> ...
>     51      16      1503      1487   116.603        72  0.306585  0.524611
>     52      16      1525      1509   116.053        88  0.171904  0.520352
>     53      16      1541      1525    115.07        64  0.121784  0.516466
>     54      16      1541      1525   112.939         0         -  0.516466
>     55      16      1541      1525   110.885         0         -  0.516466
>     56      16      1541      1525   108.905         0         -  0.516466
>     57      16      1541      1525   106.994         0         -  0.516466
> ... ( http://pastebin.com/vV50YBVK )
>
> Bandwidth (MB/sec):     81.760
>
> Stddev Bandwidth:       53.8371
> Max bandwidth (MB/sec): 156
> Min bandwidth (MB/sec): 0
> Average Latency:        0.782271
> Stddev Latency:         2.51829
> Max latency:            26.1715
> Min latency:            0.084654
>
> basically benchmark goes at full disk speed and then it stops any I/O for 10+ seconds
>
> During that time all IO and cpu load on all nodes basically stops and ceph -w starts to report:
>
> 2013-09-02 16:44:57.794115 osd.4 [WRN] 6 slow requests, 1 included below; oldest blocked for > 62.953663 secs
> 2013-09-02 16:44:57.794125 osd.4 [WRN] slow request 60.363101 seconds old, received at 2013-09-02 16:43:57.430961: osd_op(client.381797.0:2109 benchmark_data_hqblade203.non.3dart.com_18829_object2108 [write 0~4194304] 14.745012c3 e277) v4 currently waiting for subops from [0]
> 2013-09-02 16:45:01.795211 osd.4 [WRN] 6 slow requests, 1 included below; oldest blocked for > 66.954773 secs
> 2013-09-02 16:45:01.795221 osd.4 [WRN] slow request 60.661060 seconds old, received at 2013-09-02 16:44:01.134112: osd_op(client.381797.0:2199 benchmark_data_hqblade203.non.3dart.com_18829_object2198 [write 0~4194304] 14.dec41e60 e277) v4 currently waiting for subops from [0]
> 2013-09-02 16:45:02.795582 osd.4 [WRN] 6 slow requests, 2 included below; oldest blocked for > 67.955102 secs
> 2013-09-02 16:45:02.795590 osd.4 [WRN] slow request 60.316291 seconds old, received at 2013-09-02 16:44:02.479210: osd_op(client.381797.0:2230 benchmark_data_hqblade203.non.3dart.com_18829_object2229 [write 0~4194304] 14.b3ca5505 e277) v4 currently waiting for subops from [0]
> 2013-09-02 16:45:02.795595 osd.4 [WRN] slow request 60.014792 seconds old, received at 2013-09-02 16:44:02.780709: osd_op(client.381797.0:2234 benchmark_data_hqblade203.non.3dart.com_18829_object2233 [write 0~4194304] 14.a8c8cfd5 e277) v4 currently waiting for subops from [0]
> 2013-09-02 16:45:03.723742 osd.0 [WRN] 10 slow requests, 1 included below; oldest blocked for > 69.571037 secs
> 2013-09-02 16:45:03.723748 osd.0 [WRN] slow request 60.871583 seconds
> old, received at 2013-09-02 16:44:02.852110:
> osd_op(client.381797.0:2235
> benchmark_data_hqblade203.non.3dart.com_18829_object2234 [write
> 0~4194304] 14.d44b2ab6 e277) v4 currently waiting for subops from [4]
>
> Any ideas why it is happening and how it can be debugged ? it seems that there is something wrong with osd.0 but there doesnt seem to be anything wrong with machine itself (bonnie++ and dd on machine does not show up any lockups)
>
> --
> Mariusz Gronczewski, Administrator
>
> Efigence Sp. z o. o.
> ul. Wołoska 9a, 02-583 Warszawa
> T: [+48] 22 380 13 13
> F: [+48] 22 380 13 14
> E: mariusz.gronczewski@xxxxxxxxxxxx
> <mailto:mariusz.gronczewski@xxxxxxxxxxxx>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com