Hi Ahsley,
Right - so the 50% bandwidth is OK, I guess, but it was more the drop in IOPS that was concerning (hence the subject line about 200 IOPS) *sad face*.
Right - so the 50% bandwidth is OK, I guess, but it was more the drop in IOPS that was concerning (hence the subject line about 200 IOPS) *sad face*.
That, and the Optane drives weren't exactly cheap, and I was hoping they would compensate for the overhead of Ceph.
At random read, each Optane drive is capable of 550000 IOPS (random read) and 500000 IOPS (random write). Yet we're seeing it drop to around 0.04% of that in testing (200 IOPS). Is that sort of drop in IOPS normal for Ceph?
Each node can take up to 8 x 2.5" drives. If I loaded up say 4 cheap SSDs in each (e.g. Intel S3700 SSD), instead of one Optane drive per node, would that have better performance with 4 x 3 = 12 drives? (Would I still put 4 OSDs per physical drive)? Or some way to supplement the Optane's with SSDs? (Although I would assume any SSD I get is going to be slower than an Optane drive).
At random read, each Optane drive is capable of 550000 IOPS (random read) and 500000 IOPS (random write). Yet we're seeing it drop to around 0.04% of that in testing (200 IOPS). Is that sort of drop in IOPS normal for Ceph?
Each node can take up to 8 x 2.5" drives. If I loaded up say 4 cheap SSDs in each (e.g. Intel S3700 SSD), instead of one Optane drive per node, would that have better performance with 4 x 3 = 12 drives? (Would I still put 4 OSDs per physical drive)? Or some way to supplement the Optane's with SSDs? (Although I would assume any SSD I get is going to be slower than an Optane drive).
Or are there tweaks I can do to either configuration, or our layout that could eke out more IOPS?
(This is going to be used for VM hosting, so IOPS is definitely a concern).
Thanks,
Victor
Thanks,
Victor
On Sat, Mar 9, 2019 at 9:27 PM Ashley Merrick <singapore@xxxxxxxxxxxxxx> wrote:
What kind of results are you expecting?Looking at the specs they are "up to" 2000 Write, and 2500 Read, so your around 50-60% of the max up to speed, which I wouldn't say is to bad due to the fact CEPH / Bluestore has an overhead specially when using a single disk for DB & WAL & Content.Remember CEPH scales with the amount of physical disks you have, as you only have 3 disks every piece of I/O is hitting all 3 disks, if you had 6 disks for example and still did replication of 3 then only 50% of I/O would be hitting each disks, therefore id expect to see performance jump.On Sat, Mar 9, 2019 at 5:08 PM Victor Hooi <victorhooi@xxxxxxxxx> wrote:_______________________________________________Hi,I'm setting up a 3-node Proxmox cluster with Ceph as the shared storage, based around Intel Optane 900P drives (which are meant to be the bee's knees), and I'm seeing pretty low IOPS/bandwidth.I created a new Ceph pool specifically for benchmarking with 128 PGs.
- 3 nodes, each running a Ceph monitor daemon, and OSDs.
- Node 1 has 48 GB of RAM and 10 cores (Intel 4114), and Node 2 and 3 have 32 GB of RAM and 4 cores (Intel E3-1230V6)
- Each node has a Intel Optane 900p (480GB) NVMe dedicated for Ceph.
- 4 OSDs per node (total of 12 OSDs)
- NICs are Intel X520-DA2, with 10GBASE-LR going to a Unifi US-XG-16.
- First 10GB port is for Proxmox VM traffic, second 10GB port is for Ceph traffic.
Write results:root@vwnode1:~# rados bench -p benchmarking 60 write -b 4M -t 16 --no-cleanup....sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s)60 16 12258 12242 816.055 788 0.0856726 0.0783458Total time run: 60.069008Total writes made: 12258Write size: 4194304Object size: 4194304Bandwidth (MB/sec): 816.261Stddev Bandwidth: 17.4584Max bandwidth (MB/sec): 856Min bandwidth (MB/sec): 780Average IOPS: 204Stddev IOPS: 4Max IOPS: 214Min IOPS: 195Average Latency(s): 0.0783801Stddev Latency(s): 0.0468404Max latency(s): 0.437235Min latency(s): 0.0177178Sequential read results - I don't know why this only ran for 32 seconds?root@vwnode1:~# rados bench -p benchmarking 60 seq -t 16....Total time run: 32.608549Total reads made: 12258Read size: 4194304Object size: 4194304Bandwidth (MB/sec): 1503.65Average IOPS: 375Stddev IOPS: 22Max IOPS: 410Min IOPS: 326Average Latency(s): 0.0412777Max latency(s): 0.498116Min latency(s): 0.00447062Random read result:root@vwnode1:~# rados bench -p benchmarking 60 rand -t 16....Total time run: 60.066384Total reads made: 22819Read size: 4194304Object size: 4194304Bandwidth (MB/sec): 1519.59Average IOPS: 379Stddev IOPS: 21Max IOPS: 424Min IOPS: 320Average Latency(s): 0.0408697Max latency(s): 0.662955Min latency(s): 0.00172077I then cleaned-up with:root@vwnode1:~# rados -p benchmarking cleanupRemoved 12258 objects
I then tested with another Ceph pool, with 512 PGs (originally created for Proxmox VMs) - results seem quite similar:root@vwnode1:~# rados bench -p proxmox_vms 60 write -b 4M -t 16 --no-cleanup....Total time run: 60.041712Total writes made: 12132Write size: 4194304Object size: 4194304Bandwidth (MB/sec): 808.238Stddev Bandwidth: 20.7444Max bandwidth (MB/sec): 860Min bandwidth (MB/sec): 744Average IOPS: 202Stddev IOPS: 5Max IOPS: 215Min IOPS: 186Average Latency(s): 0.0791746Stddev Latency(s): 0.0432707Max latency(s): 0.42535Min latency(s): 0.0200791
Sequential read result - once again, only ran for 32 seconds:root@vwnode1:~# rados bench -p proxmox_vms 60 seq -t 16....Total time run: 31.249274Total reads made: 12132Read size: 4194304Object size: 4194304Bandwidth (MB/sec): 1552.93Average IOPS: 388Stddev IOPS: 30Max IOPS: 460Min IOPS: 320Average Latency(s): 0.0398702Max latency(s): 0.481106Min latency(s): 0.00461585
Random read result:root@vwnode1:~# rados bench -p proxmox_vms 60 rand -t 16...Total time run: 60.088822Total reads made: 23626Read size: 4194304Object size: 4194304Bandwidth (MB/sec): 1572.74Average IOPS: 393Stddev IOPS: 25Max IOPS: 432Min IOPS: 322Average Latency(s): 0.0392854Max latency(s): 0.693123Min latency(s): 0.00178545Cleanup:root@vwnode1:~# rados -p proxmox_vms cleanupRemoved 12132 objectsroot@vwnode1:~# rados dfPOOL_NAME USED OBJECTS CLONES COPIES MISSING_ON_PRIMARY UNFOUND DEGRADED RD_OPS RD WR_OPS WRproxmox_vms 169GiB 43396 0 130188 0 0 0 909519 298GiB 619697 272GiBtotal_objects 43396total_used 564GiBtotal_avail 768GiBtotal_space 1.30TiB/These results (800 MB/s writes, 1500 Mb/s reads, and 200 write IOPS, 400 read IOPS) seems incredibly low - particularly considering what the Optane 900p is meant to be capable of.
Is this in line with what you might expect on this hardware with Ceph though?
Or is there some way to find out the source of bottleneck?
Thanks,
Victor
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com