3-node cluster with 3 x Intel Optane 900P - very low benchmarked performance (200 IOPS)?

Victor Hooi <victorhooi@xxxxxxxxx> · Sat, 9 Mar 2019 20:07:32 +1100

Hi,
I'm setting up a 3-node Proxmox cluster with Ceph as the shared storage, based around Intel Optane 900P drives (which are meant to be the bee's knees), and I'm seeing pretty low IOPS/bandwidth.
3 nodes, each running a Ceph monitor daemon, and OSDs.
Node 1 has 48 GB of RAM and 10 cores (Intel 4114), and Node 2 and 3 have 32 GB of RAM and 4 cores (Intel E3-1230V6)
Each node has a Intel Optane 900p (480GB) NVMe dedicated for Ceph.
4 OSDs per node (total of 12 OSDs)
NICs are Intel X520-DA2, with 10GBASE-LR going to a Unifi US-XG-16.
First 10GB port is for Proxmox VM traffic, second 10GB port is for Ceph traffic.
I created a new Ceph pool specifically for benchmarking with 128 PGs.

Write results:
root@vwnode1:~# rados bench -p benchmarking 60 write -b 4M -t 16 --no-cleanup
....
  sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg lat(s)
   60      16     12258     12242   816.055       788   0.0856726   0.0783458
Total time run:         60.069008
Total writes made:      12258
Write size:             4194304
Object size:            4194304
Bandwidth (MB/sec):     816.261
Stddev Bandwidth:       17.4584
Max bandwidth (MB/sec): 856
Min bandwidth (MB/sec): 780
Average IOPS:           204
Stddev IOPS:            4
Max IOPS:               214
Min IOPS:               195
Average Latency(s):     0.0783801
Stddev Latency(s):      0.0468404
Max latency(s):         0.437235
Min latency(s):         0.0177178

Sequential read results - I don't know why this only ran for 32 seconds?

root@vwnode1:~# rados bench -p benchmarking 60 seq -t 16
....
Total time run:       32.608549
Total reads made:     12258
Read size:            4194304
Object size:          4194304
Bandwidth (MB/sec):   1503.65
Average IOPS:         375
Stddev IOPS:          22
Max IOPS:             410
Min IOPS:             326
Average Latency(s):   0.0412777
Max latency(s):       0.498116
Min latency(s):       0.00447062

Random read result:

root@vwnode1:~# rados bench -p benchmarking 60 rand -t 16
....
Total time run:       60.066384
Total reads made:     22819
Read size:            4194304
Object size:          4194304
Bandwidth (MB/sec):   1519.59
Average IOPS:         379
Stddev IOPS:          21
Max IOPS:             424
Min IOPS:             320
Average Latency(s):   0.0408697
Max latency(s):       0.662955
Min latency(s):       0.00172077

I then cleaned-up with:

root@vwnode1:~# rados -p benchmarking cleanup
Removed 12258 objects

I then tested with another Ceph pool, with 512 PGs (originally created for Proxmox VMs) - results seem quite similar:

root@vwnode1:~# rados bench -p proxmox_vms 60 write -b 4M -t 16 --no-cleanup
....
Total time run:         60.041712
Total writes made:      12132
Write size:             4194304
Object size:            4194304
Bandwidth (MB/sec):     808.238
Stddev Bandwidth:       20.7444
Max bandwidth (MB/sec): 860
Min bandwidth (MB/sec): 744
Average IOPS:           202
Stddev IOPS:            5
Max IOPS:               215
Min IOPS:               186
Average Latency(s):     0.0791746
Stddev Latency(s):      0.0432707
Max latency(s):         0.42535
Min latency(s):         0.0200791

Sequential read result - once again, only ran for 32 seconds:

root@vwnode1:~# rados bench -p proxmox_vms 60 seq -t 16
....
Total time run:       31.249274
Total reads made:     12132
Read size:            4194304
Object size:          4194304
Bandwidth (MB/sec):   1552.93
Average IOPS:         388
Stddev IOPS:          30
Max IOPS:             460
Min IOPS:             320
Average Latency(s):   0.0398702
Max latency(s):       0.481106
Min latency(s):       0.00461585

Random read result:

root@vwnode1:~# rados bench -p proxmox_vms 60 rand -t 16
....
Total time run:       60.088822
Total reads made:     23626
Read size:            4194304
Object size:          4194304
Bandwidth (MB/sec):   1572.74
Average IOPS:         393
Stddev IOPS:          25
Max IOPS:             432
Min IOPS:             322
Average Latency(s):   0.0392854
Max latency(s):       0.693123
Min latency(s):       0.00178545

Cleanup:

root@vwnode1:~# rados -p proxmox_vms cleanup
Removed 12132 objects
root@vwnode1:~# rados df
POOL_NAME   USED   OBJECTS CLONES COPIES MISSING_ON_PRIMARY UNFOUND DEGRADED RD_OPS RD     WR_OPS WR
proxmox_vms 169GiB   43396      0 130188                  0       0        0 909519 298GiB 619697 272GiB

total_objects    43396
total_used       564GiB
total_avail      768GiB
total_space      1.30TiB/

These results (800 MB/s writes, 1500 Mb/s reads, and 200 write IOPS, 400 read IOPS) seems incredibly low - particularly considering what the Optane 900p is meant to be capable of.

Is this in line with what you might expect on this hardware with Ceph though?

Or is there some way to find out the source of bottleneck?

Thanks,
Victor
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com