CEPH cache layer. Very slow

Voloshanenko Igor <igor.voloshanenko@xxxxxxxxx> · Wed, 12 Aug 2015 18:33:22 +0300

Hi all, we have setup CEPH cluster with 60 OSD (2 diff types) (5 nodes, 12 disks on each, 10 HDD, 2 SSD)
Also we cover this with custom crushmap with 2 root leaf

ID   WEIGHT  TYPE NAME              UP/DOWN REWEIGHT PRIMARY-AFFINITY
-100 5.00000 root ssd
-102 1.00000     host ix-s2-ssd
   2 1.00000         osd.2               up  1.00000          1.00000
   9 1.00000         osd.9               up  1.00000          1.00000
-103 1.00000     host ix-s3-ssd
   3 1.00000         osd.3               up  1.00000          1.00000
   7 1.00000         osd.7               up  1.00000          1.00000
-104 1.00000     host ix-s5-ssd
   1 1.00000         osd.1               up  1.00000          1.00000
   6 1.00000         osd.6               up  1.00000          1.00000
-105 1.00000     host ix-s6-ssd
   4 1.00000         osd.4               up  1.00000          1.00000
   8 1.00000         osd.8               up  1.00000          1.00000
-106 1.00000     host ix-s7-ssd
   0 1.00000         osd.0               up  1.00000          1.00000
   5 1.00000         osd.5               up  1.00000          1.00000
  -1 5.00000 root platter
  -2 1.00000     host ix-s2-platter
  13 1.00000         osd.13              up  1.00000          1.00000
  17 1.00000         osd.17              up  1.00000          1.00000
  21 1.00000         osd.21              up  1.00000          1.00000
  27 1.00000         osd.27              up  1.00000          1.00000
  32 1.00000         osd.32              up  1.00000          1.00000
  37 1.00000         osd.37              up  1.00000          1.00000
  44 1.00000         osd.44              up  1.00000          1.00000
  48 1.00000         osd.48              up  1.00000          1.00000
  55 1.00000         osd.55              up  1.00000          1.00000
  59 1.00000         osd.59              up  1.00000          1.00000
  -3 1.00000     host ix-s3-platter
  14 1.00000         osd.14              up  1.00000          1.00000
  18 1.00000         osd.18              up  1.00000          1.00000
  23 1.00000         osd.23              up  1.00000          1.00000
  28 1.00000         osd.28              up  1.00000          1.00000
  33 1.00000         osd.33              up  1.00000          1.00000
  39 1.00000         osd.39              up  1.00000          1.00000
  43 1.00000         osd.43              up  1.00000          1.00000
  47 1.00000         osd.47              up  1.00000          1.00000
  54 1.00000         osd.54              up  1.00000          1.00000
  58 1.00000         osd.58              up  1.00000          1.00000
  -4 1.00000     host ix-s5-platter
  11 1.00000         osd.11              up  1.00000          1.00000
  16 1.00000         osd.16              up  1.00000          1.00000
  22 1.00000         osd.22              up  1.00000          1.00000
  26 1.00000         osd.26              up  1.00000          1.00000
  31 1.00000         osd.31              up  1.00000          1.00000
  36 1.00000         osd.36              up  1.00000          1.00000
  41 1.00000         osd.41              up  1.00000          1.00000
  46 1.00000         osd.46              up  1.00000          1.00000
  51 1.00000         osd.51              up  1.00000          1.00000
  56 1.00000         osd.56              up  1.00000          1.00000
  -5 1.00000     host ix-s6-platter
  12 1.00000         osd.12              up  1.00000          1.00000
  19 1.00000         osd.19              up  1.00000          1.00000
 24 1.00000         osd.24              up  1.00000          1.00000
  29 1.00000         osd.29              up  1.00000          1.00000
  34 1.00000         osd.34              up  1.00000          1.00000
  38 1.00000         osd.38              up  1.00000          1.00000
  42 1.00000         osd.42              up  1.00000          1.00000
  50 1.00000         osd.50              up  1.00000          1.00000
  53 1.00000         osd.53              up  1.00000          1.00000
  57 1.00000         osd.57              up  1.00000          1.00000
  -6 1.00000     host ix-s7-platter
  10 1.00000         osd.10              up  1.00000          1.00000
  15 1.00000         osd.15              up  1.00000          1.00000
  20 1.00000         osd.20              up  1.00000          1.00000
  25 1.00000         osd.25              up  1.00000          1.00000
  30 1.00000         osd.30              up  1.00000          1.00000
  35 1.00000         osd.35              up  1.00000          1.00000
  40 1.00000         osd.40              up  1.00000          1.00000
  45 1.00000         osd.45              up  1.00000          1.00000
  49 1.00000         osd.49              up  1.00000          1.00000
  52 1.00000         osd.52              up  1.00000          1.00000

Then create 2 pools, 1 on HDD (platters), 1 on SSD/
and put SSD pul in from of HDD pool (cache tier)

now we receive very bad performance results from cluster.
Even with rados bench we received very unstable performance with even zero speed. So it's create very big issues for our clients.

I try to tune all possible values, including OSD, but still no luck.

Also very unbelievble situation, when i do 
ceph tell... bench on SSD OSD - i receive about 20MB/s
If for HDD - 67 MB/s...

I don;t understand why cache pools which consist of SSD works so bad... We used Samsung 850 Pro 256 Gb as SSDs

Can you guys give me advice please...

Also very idiotic thing, when i set cache-mode to forward and try to flush-evict all object (not all object evicted, some busy (locked on KVM sides). but now i receive quite stable results for rados bench

 Total time run:         30.275871
Total writes made:      2076
Write size:             4194304
Bandwidth (MB/sec):     274.278

Stddev Bandwidth:       75.1445
Max bandwidth (MB/sec): 368
Min bandwidth (MB/sec): 0
Average Latency:        0.232892
Stddev Latency:         0.240356
Max latency:            2.01436
Min latency:            0.0716344

Without zeros, etc...  So i don't understand how it's possible.

Also interesting thing, when i disable overlay for pool, rados bench become around 70MB/s as for ordinary HDD, but in same time rados bench for SSD pool, which not used anymore show same bad results...

So please, give me some direction to deeg...

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com