Hi,
Write results:
I'm setting up a 3-node Proxmox cluster with Ceph as the shared storage, based around Intel Optane 900P drives (which are meant to be the bee's knees), and I'm seeing pretty low IOPS/bandwidth.
- 3 nodes, each running a Ceph monitor daemon, and OSDs.
- Node 1 has 48 GB of RAM and 10 cores (Intel 4114), and Node 2 and 3 have 32 GB of RAM and 4 cores (Intel E3-1230V6)
- Each node has a Intel Optane 900p (480GB) NVMe dedicated for Ceph.
- 4 OSDs per node (total of 12 OSDs)
- NICs are Intel X520-DA2, with 10GBASE-LR going to a Unifi US-XG-16.
- First 10GB port is for Proxmox VM traffic, second 10GB port is for Ceph traffic.
Write results:
root@vwnode1:~# rados bench -p benchmarking 60 write -b 4M -t 16 --no-cleanup....sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s)60 16 12258 12242 816.055 788 0.0856726 0.0783458Total time run: 60.069008Total writes made: 12258Write size: 4194304Object size: 4194304Bandwidth (MB/sec): 816.261Stddev Bandwidth: 17.4584Max bandwidth (MB/sec): 856Min bandwidth (MB/sec): 780Average IOPS: 204Stddev IOPS: 4Max IOPS: 214Min IOPS: 195Average Latency(s): 0.0783801Stddev Latency(s): 0.0468404Max latency(s): 0.437235Min latency(s): 0.0177178
Sequential read results - I don't know why this only ran for 32 seconds?
root@vwnode1:~# rados bench -p benchmarking 60 seq -t 16....Total time run: 32.608549Total reads made: 12258Read size: 4194304Object size: 4194304Bandwidth (MB/sec): 1503.65Average IOPS: 375Stddev IOPS: 22Max IOPS: 410Min IOPS: 326Average Latency(s): 0.0412777Max latency(s): 0.498116Min latency(s): 0.00447062
Random read result:
root@vwnode1:~# rados bench -p benchmarking 60 rand -t 16....Total time run: 60.066384Total reads made: 22819Read size: 4194304Object size: 4194304Bandwidth (MB/sec): 1519.59Average IOPS: 379Stddev IOPS: 21Max IOPS: 424Min IOPS: 320Average Latency(s): 0.0408697Max latency(s): 0.662955Min latency(s): 0.00172077
I then cleaned-up with:
root@vwnode1:~# rados -p benchmarking cleanupRemoved 12258 objects
I then tested with another Ceph pool, with 512 PGs (originally created for Proxmox VMs) - results seem quite similar:
root@vwnode1:~# rados bench -p proxmox_vms 60 write -b 4M -t 16 --no-cleanup....Total time run: 60.041712Total writes made: 12132Write size: 4194304Object size: 4194304Bandwidth (MB/sec): 808.238Stddev Bandwidth: 20.7444Max bandwidth (MB/sec): 860Min bandwidth (MB/sec): 744Average IOPS: 202Stddev IOPS: 5Max IOPS: 215Min IOPS: 186Average Latency(s): 0.0791746Stddev Latency(s): 0.0432707Max latency(s): 0.42535Min latency(s): 0.0200791
Sequential read result - once again, only ran for 32 seconds:
root@vwnode1:~# rados bench -p proxmox_vms 60 seq -t 16....Total time run: 31.249274Total reads made: 12132Read size: 4194304Object size: 4194304Bandwidth (MB/sec): 1552.93Average IOPS: 388Stddev IOPS: 30Max IOPS: 460Min IOPS: 320Average Latency(s): 0.0398702Max latency(s): 0.481106Min latency(s): 0.00461585
Random read result:
root@vwnode1:~# rados bench -p proxmox_vms 60 rand -t 16....Total time run: 60.088822Total reads made: 23626Read size: 4194304Object size: 4194304Bandwidth (MB/sec): 1572.74Average IOPS: 393Stddev IOPS: 25Max IOPS: 432Min IOPS: 322Average Latency(s): 0.0392854Max latency(s): 0.693123Min latency(s): 0.00178545
Cleanup:
root@vwnode1:~# rados -p proxmox_vms cleanupRemoved 12132 objectsroot@vwnode1:~# rados dfPOOL_NAME USED OBJECTS CLONES COPIES MISSING_ON_PRIMARY UNFOUND DEGRADED RD_OPS RD WR_OPS WRproxmox_vms 169GiB 43396 0 130188 0 0 0 909519 298GiB 619697 272GiBtotal_objects 43396total_used 564GiBtotal_avail 768GiBtotal_space 1.30TiB/
Is this in line with what you might expect on this hardware with Ceph though?
Or is there some way to find out the source of bottleneck?
Thanks,
Victor
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com