Hi all,
We're pretty new to ceph, but loving it so far.
We have a three-node cluster, four 4TB OSDs per node, journal (5GB) on
SSD, 10G ethernet cluster network, 64GB ram on the nodes, total 12 OSDs.
We noticed the following output when using ceph bench:
root@ceph1:~# rados bench -p scbench 600 write --no-cleanup
Maintaining 16 concurrent writes of 4194304 bytes for up to 600 seconds or 0 objects
Object prefix: benchmark_data_pm1_36584
sec Cur ops started finished avg MB/s cur MB/s last lat avg lat
0 0 0 0 0 0 - 0
1 16 124 108 431.899 432 0.138315 0.139077
2 16 237 221 441.928 452 0.169759 0.140138
3 16 351 335 446.598 456 0.105837 0.139844
4 16 466 450 449.938 460 0.140141 0.139716
5 16 569 553 442.337 412 0.025245 0.139328
6 16 634 618 411.943 260 0.0302609 0.147129
7 16 692 676 386.233 232 1.01843 0.15158
8 16 721 705 352.455 116 0.0224958 0.159924
9 16 721 705 313.293 0 - 0.159924
+------------------ notice the drop to zero for MB/s
10 16 764 748 299.163 86 0.0629263 0.20961
11 16 869 853 310.144 420 0.0805086 0.204707
12 16 986 970 323.295 468 0.175718 0.196822
13 16 1100 1084 333.5 456 0.171172 0.19105
14 16 1153 1137 324.819 212 0.0468416 0.188643
15 16 1225 1209 322.363 288 0.0421159 0.195791
16 16 1236 1220 304.964 44 1.28629 0.195499
17 16 1236 1220 287.025 0 - 0.195499
18 16 1236 1220 271.079 0 - 0.195499
+------------------ notice again the drop to zero for MB/s
19 16 1324 1308 275.336 117.333 0.148679 0.231708
20 16 1436 1420 283.967 448 0.120878 0.224367
21 16 1552 1536 292.538 464 0.173587 0.218141
22 16 1662 1646 299.238 440 0.141544 0.212946
23 16 1720 1704 296.314 232 0.0273257 0.211416
24 16 1729 1713 285.467 36 0.0215821 0.211308
25 16 1729 1713 274.048 0 - 0.211308
26 16 1729 1713 263.508 0 - 0.211308
+------------------ notice again the drop to zero for MB/s
27 16 1787 1771 262.34 77.3333 0.0338129 0.241103
28 16 1836 1820 259.97 196 0.183042 0.245665
29 16 1949 1933 266.59 452 0.129397 0.239445
30 16 2058 2042 272.235 436 0.165108 0.234447
31 16 2159 2143 276.484 404 0.0466259 0.229704
32 16 2189 2173 271.594 120 0.0206958 0.231772
So regular intervals, the "cur MB/s" appears to drop to zero. If
meanwhile we ALSO run iperf between two nodes, we can tell that the
network is fuctioning perfectly: while ceph bench goes to zero, iperf
continues at max speed. (10G ethernet)
So it seems there is something slowing down ceph at 'regular' intervals.
Is this normal, and expected, or not? In which case: What do we need to
look at?
During the 0 MB/sec, there is NO increased cpu usage: it is usually
around 15 - 20% for the four ceph-osd processes.
Do we have an issue..? And if yes: Anyone with a suggestions where to
look at?
Some more details:
- ceph version 0.94.7 (d56bdf93ced6b80b07397d57e3fa68fe68304432)
- Linux ceph2 4.4.15-1-pve #1 SMP Thu Jul 28 10:54:13 CEST 2016 x86_64
GNU/Linux
Thanks in advance, and best regards from the netherlands,
MJ
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com