rados bench output question

lists <lists@xxxxxxxxxxxxx> · Tue, 6 Sep 2016 12:57:55 +0200

Hi all,

We're pretty new to ceph, but loving it so far.

We have a three-node cluster, four 4TB OSDs per node, journal (5GB) on 
SSD, 10G ethernet cluster network, 64GB ram on the nodes, total 12 OSDs.

We noticed the following output when using ceph bench:

root@ceph1:~# rados bench -p scbench 600 write --no-cleanup
Maintaining 16 concurrent writes of 4194304 bytes for up to 600 seconds or 0 objects
Object prefix: benchmark_data_pm1_36584
sec Cur ops  started  finished  avg MB/s  cur MB/s  last lat  avg lat
0  0  0  0  0  0  -  0
1  16  124  108  431.899  432  0.138315  0.139077
2  16  237  221  441.928  452  0.169759  0.140138
3  16  351  335  446.598  456  0.105837  0.139844
4  16  466  450  449.938  460  0.140141  0.139716
5  16  569  553  442.337  412  0.025245  0.139328
6  16  634  618  411.943  260 0.0302609  0.147129
7  16  692  676  386.233  232  1.01843  0.15158
8  16  721  705  352.455  116 0.0224958  0.159924
9  16  721  705  313.293  0  -  0.159924
+------------------ notice the drop to zero for MB/s
10  16  764  748  299.163  86 0.0629263  0.20961
11  16  869  853  310.144  420 0.0805086  0.204707
12  16  986  970  323.295  468  0.175718  0.196822
13  16  1100  1084  333.5  456  0.171172  0.19105
14  16  1153  1137  324.819  212 0.0468416  0.188643
15  16  1225  1209  322.363  288 0.0421159  0.195791
16  16  1236  1220  304.964  44  1.28629  0.195499
17  16  1236  1220  287.025  0  -  0.195499
18  16  1236  1220  271.079  0  -  0.195499
+------------------ notice again the drop to zero for MB/s
19  16  1324  1308  275.336  117.333  0.148679  0.231708
20  16  1436  1420  283.967  448  0.120878  0.224367
21  16  1552  1536  292.538  464  0.173587  0.218141
22  16  1662  1646  299.238  440  0.141544  0.212946
23  16  1720  1704  296.314  232 0.0273257  0.211416
24  16  1729  1713  285.467  36 0.0215821  0.211308
25  16  1729  1713  274.048  0  -  0.211308
26  16  1729  1713  263.508  0  -  0.211308
+------------------ notice again the drop to zero for MB/s
27  16  1787  1771  262.34  77.3333 0.0338129  0.241103
28  16  1836  1820  259.97  196  0.183042  0.245665
29  16  1949  1933  266.59  452  0.129397  0.239445
30  16  2058  2042  272.235  436  0.165108  0.234447
31  16  2159  2143  276.484  404 0.0466259  0.229704
32  16  2189  2173  271.594  120 0.0206958  0.231772

So regular intervals, the "cur MB/s" appears to drop to zero. If 
meanwhile we ALSO run iperf between two nodes, we can tell that the 
network is fuctioning perfectly: while ceph bench goes to zero, iperf 
continues at max speed. (10G ethernet)

So it seems there is something slowing down ceph at 'regular' intervals. 
Is this normal, and expected, or not? In which case: What do we need to 
look at?

During the 0 MB/sec, there is NO increased cpu usage: it is usually 
around 15 - 20% for the four ceph-osd processes.

Do we have an issue..? And if yes: Anyone with a suggestions where to 
look at?

Some more details:
- ceph version 0.94.7 (d56bdf93ced6b80b07397d57e3fa68fe68304432)
- Linux ceph2 4.4.15-1-pve #1 SMP Thu Jul 28 10:54:13 CEST 2016 x86_64 
GNU/Linux

Thanks in advance, and best regards from the netherlands,

MJ
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com