We're planning on installing 12X Virtual Machines with some heavy loads.
the SSD drives are INTEL SSDSC2BA400G4
The SATA drives are ST8000NM0055-1RM112
Please explain your comment, "b) will find a lot of people here who don't approve of it."
I don't have access to the switches right now, but they're new so whatever default config ships from factory would be active. Though iperf shows 10.5 GBytes / 9.02 Gbits/sec throughput.
What speeds would you expect?
"Though with your setup I would have expected something faster, but NOT the
theoretical 600MB/s 4 HDDs will do in sequential writes."On this, "If an OSD has no fast WAL/DB, it will drag the overall speed down. Verify and if so fix this and re-test.": how?
On Mon, Nov 20, 2017 at 1:44 PM, Christian Balzer <chibi@xxxxxxx> wrote:
On Mon, 20 Nov 2017 12:38:55 +0200 Rudi Ahlers wrote:
> Hi,
>
> Can someone please help me, how do I improve performance on ou CEPH cluster?
>
> The hardware in use are as follows:
> 3x SuperMicro servers with the following configuration
> 12Core Dual XEON 2.2Ghz
Faster cores is better for Ceph, IMNSHO.
Though with main storage on HDDs, this will do.
> 128GB RAM
Overkill for Ceph but I see something else below...
> 2x 400GB Intel DC SSD drives
Exact model please.
> 4x 8TB Seagate 7200rpm 6Gbps SATA HDD's
One hopes that's a non SMR one.
Model please.
> 1x SuperMicro DOM for Proxmox / Debian OS
Ah, Proxmox.
I'm personally not averse to converged, high density, multi-role clusters
myself, but you:
a) need to know what you're doing and
b) will find a lot of people here who don't approve of it.
I've avoided DOMs so far (non-hotswapable SPOF), even though the SM ones
look good on paper with regards to endurance and IOPS.
The later being rather important for your monitors.
> 4x Port 10Gbe NIC
> Cisco 10Gbe switch.
>
Configuration would be nice for those, LACP?
>
> root@virt2:~# rados bench -p Data 10 write --no-cleanup
> hints = 1
> Maintaining 16 concurrent writes of 4194304 bytes to objects of size
> 4194304 for up to 10 seconds or 0 objects
rados bench is limited tool and measuring bandwidth is in nearly all
the use cases pointless.
Latency is where it is at and testing from inside a VM is more relevant
than synthetic tests of the storage.
But it is a start.
What part of this surprises you?
> Object prefix: benchmark_data_virt2_39099
> sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg
> lat(s)
> 0 0 0 0 0 0 -
> 0
> 1 16 85 69 275.979 276 0.185576
> 0.204146
> 2 16 171 155 309.966 344 0.0625409
> 0.193558
> 3 16 243 227 302.633 288 0.0547129
> 0.19835
> 4 16 330 314 313.965 348 0.0959492
> 0.199825
> 5 16 413 397 317.565 332 0.124908
> 0.196191
> 6 16 494 478 318.633 324 0.1556
> 0.197014
> 7 15 591 576 329.109 392 0.136305
> 0.192192
> 8 16 670 654 326.965 312 0.0703808
> 0.190643
> 9 16 757 741 329.297 348 0.165211
> 0.192183
> 10 16 828 812 324.764 284 0.0935803
> 0.194041
> Total time run: 10.120215
> Total writes made: 829
> Write size: 4194304
> Object size: 4194304
> Bandwidth (MB/sec): 327.661
With a replication of 3, you have effectively the bandwidth of your 2 SSDs
(for small writes, not the case here) and the bandwidth of your 4 HDDs
available.
Given overhead, other inefficiencies and the fact that this is not a
sequential write from the HDD perspective, 320MB/s isn't all that bad.
Though with your setup I would have expected something faster, but NOT the
theoretical 600MB/s 4 HDDs will do in sequential writes.
Bluestore has no journals, don't confuse it and the people you're asking
> Stddev Bandwidth: 35.8664
> Max bandwidth (MB/sec): 392
> Min bandwidth (MB/sec): 276
> Average IOPS: 81
> Stddev IOPS: 8
> Max IOPS: 98
> Min IOPS: 69
> Average Latency(s): 0.195191
> Stddev Latency(s): 0.0830062
> Max latency(s): 0.481448
> Min latency(s): 0.0414858
> root@virt2:~# hdparm -I /dev/sda
>
>
>
> root@virt2:~# ceph osd tree
> ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
> -1 72.78290 root default
> -3 29.11316 host virt1
> 1 hdd 7.27829 osd.1 up 1.00000 1.00000
> 2 hdd 7.27829 osd.2 up 1.00000 1.00000
> 3 hdd 7.27829 osd.3 up 1.00000 1.00000
> 4 hdd 7.27829 osd.4 up 1.00000 1.00000
> -5 21.83487 host virt2
> 5 hdd 7.27829 osd.5 up 1.00000 1.00000
> 6 hdd 7.27829 osd.6 up 1.00000 1.00000
> 7 hdd 7.27829 osd.7 up 1.00000 1.00000
> -7 21.83487 host virt3
> 8 hdd 7.27829 osd.8 up 1.00000 1.00000
> 9 hdd 7.27829 osd.9 up 1.00000 1.00000
> 10 hdd 7.27829 osd.10 up 1.00000 1.00000
> 0 0 osd.0 down 0 1.00000
>
>
> root@virt2:~# ceph -s
> cluster:
> id: 278a2e9c-0578-428f-bd5b-3bb348923c27
> health: HEALTH_OK
>
> services:
> mon: 3 daemons, quorum virt1,virt2,virt3
> mgr: virt1(active)
> osd: 11 osds: 10 up, 10 in
>
> data:
> pools: 1 pools, 512 pgs
> objects: 6084 objects, 24105 MB
> usage: 92822 MB used, 74438 GB / 74529 GB avail
> pgs: 512 active+clean
>
> root@virt2:~# ceph -w
> cluster:
> id: 278a2e9c-0578-428f-bd5b-3bb348923c27
> health: HEALTH_OK
>
> services:
> mon: 3 daemons, quorum virt1,virt2,virt3
> mgr: virt1(active)
> osd: 11 osds: 10 up, 10 in
>
> data:
> pools: 1 pools, 512 pgs
> objects: 6084 objects, 24105 MB
> usage: 92822 MB used, 74438 GB / 74529 GB avail
> pgs: 512 active+clean
>
>
> 2017-11-20 12:32:08.199450 mon.virt1 [INF] mon.1 10.10.10.82:6789/0
>
>
>
> The SSD drives are used as journal drives:
>
for help.
> root@virt3:~# ceph-disk list | grep /dev/sde | grep osd
> /dev/sdb1 ceph data, active, cluster ceph, osd.8, block /dev/sdb2,
> block.db /dev/sde1
> root@virt3:~# ceph-disk list | grep /dev/sdf | grep osd
> /dev/sdc1 ceph data, active, cluster ceph, osd.9, block /dev/sdc2,
> block.db /dev/sdf1
> /dev/sdd1 ceph data, active, cluster ceph, osd.10, block /dev/sdd2,
> block.db /dev/sdf2
>
>
>
> I see now /dev/sda doesn't have a journal, though it should have. Not sure
> why.
If an OSD has no fast WAL/DB, it will drag the overall speed down.
Verify and if so fix this and re-test.
Christian
> This is the command I used to create it:
>
>
> pveceph createosd /dev/sda -bluestore 1 -journal_dev /dev/sde
>
>
--
Christian Balzer Network/Systems Engineer
chibi@xxxxxxx Rakuten Communications
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com