[luminous 12.2.2] Cluster write performance degradation problem(possibly tcmalloc related)

"shadow_lin"<shadow_lin@xxxxxxx> · Thu, 21 Dec 2017 22:52:22 +0800

My testing cluster is an all hdd cluster with 12 
osd(10T hdd each).
I moinitor luminous 12.2.2 write performance and 
osd memory usage with grafana graph for statistic logging.
The test is done  by using fio on a 
mounted rbd with follow fio parameters:
fio -directory=fiotest -direct=1 -thread -rw=write -ioengine=libaio  
-size=200G -group_reporting -bs=1m -iodepth 4 -numjobs=200 -name=writetest
       I found there is a noticeably 
performance degration over time.
       Graph of write throughput and 
iops
       https://pasteboard.co/GZflpTO.png
       Graph of osd memory usage(2 of 12 
osds,the pattern are identical)
       https://pasteboard.co/GZfmfzo.png
       Graph of osd perf
       https://pasteboard.co/GZfmZNx.png

       There are some interesting founding 
from the graph.
       After 18:00 suddenly the write 
throughput dropped and the osd latency increased. TCmalloc started relcaim 
page heap freelist much more frequently.All of this happened very fast and every 
osd had the indentical pattern.

       I have done this kind of test several 
times with different bluestore cache setting and find out with more cache the 
performance degradation would happen later.

     I don't know if this is a bug or I 
can fix it with modify some of the config of my 
cluster.     
      Any advice or direction to look into is 
appreciated.

      Thanks

2017-12-21

lin.yunfan
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com