Odd cyclical cluster performance

Patrick Dinnen <pdinnen@xxxxxxxxx> · Thu, 11 May 2017 15:47:29 -0400

Seeing some odd behaviour while testing using rados bench. This is on
a pre-split pool, two node cluster with 12 OSDs total.

ceph osd pool create newerpoolofhopes 2048 2048 replicated ""
replicated_ruleset 500000000

rados -p newerpoolofhopes bench -t 32 -b 20000 30000000 write --no-cleanup

Using Prometheus/Grafana to watch what's going on, we see oddly
regular peaks and dips in writer performance. The frequency changes
gradually but it's on the order of hours (not the seconds that might
seem easier to explain by system phenomena). It starts off at roughly
one cycle per hour and we've seen it for multiple days of constant
bench running with nothing else happening on the cluster.

A bunch of graphs showing the pattern:

https://ibb.co/djXUVk
https://ibb.co/gMNk35
https://ibb.co/iKViqk
https://ibb.co/jOXJO5
https://ibb.co/isUMbQ

sdg and sdi are SSD journal disks. The activity on the OSDs and SSDs
seems anti-correlated. SSDs peak in activity as OSDs reach the bottom
of the trough. Then the reverse. Repeat.

Does anyone have any suggestions as to what could possibly be causing
a regular pattern like this at such a low frequency?

Thanks, Patrick Dinnen
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com