On Mon, 20 Nov 2017 12:38:55 +0200 Rudi Ahlers wrote: > Hi, > > Can someone please help me, how do I improve performance on ou CEPH cluster? > > The hardware in use are as follows: > 3x SuperMicro servers with the following configuration > 12Core Dual XEON 2.2Ghz Faster cores is better for Ceph, IMNSHO. Though with main storage on HDDs, this will do. > 128GB RAM Overkill for Ceph but I see something else below... > 2x 400GB Intel DC SSD drives Exact model please. > 4x 8TB Seagate 7200rpm 6Gbps SATA HDD's One hopes that's a non SMR one. Model please. > 1x SuperMicro DOM for Proxmox / Debian OS Ah, Proxmox. I'm personally not averse to converged, high density, multi-role clusters myself, but you: a) need to know what you're doing and b) will find a lot of people here who don't approve of it. I've avoided DOMs so far (non-hotswapable SPOF), even though the SM ones look good on paper with regards to endurance and IOPS. The later being rather important for your monitors. > 4x Port 10Gbe NIC > Cisco 10Gbe switch. > Configuration would be nice for those, LACP? > > root@virt2:~# rados bench -p Data 10 write --no-cleanup > hints = 1 > Maintaining 16 concurrent writes of 4194304 bytes to objects of size > 4194304 for up to 10 seconds or 0 objects rados bench is limited tool and measuring bandwidth is in nearly all the use cases pointless. Latency is where it is at and testing from inside a VM is more relevant than synthetic tests of the storage. But it is a start. > Object prefix: benchmark_data_virt2_39099 > sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg > lat(s) > 0 0 0 0 0 0 - > 0 > 1 16 85 69 275.979 276 0.185576 > 0.204146 > 2 16 171 155 309.966 344 0.0625409 > 0.193558 > 3 16 243 227 302.633 288 0.0547129 > 0.19835 > 4 16 330 314 313.965 348 0.0959492 > 0.199825 > 5 16 413 397 317.565 332 0.124908 > 0.196191 > 6 16 494 478 318.633 324 0.1556 > 0.197014 > 7 15 591 576 329.109 392 0.136305 > 0.192192 > 8 16 670 654 326.965 312 0.0703808 > 0.190643 > 9 16 757 741 329.297 348 0.165211 > 0.192183 > 10 16 828 812 324.764 284 0.0935803 > 0.194041 > Total time run: 10.120215 > Total writes made: 829 > Write size: 4194304 > Object size: 4194304 > Bandwidth (MB/sec): 327.661 What part of this surprises you? With a replication of 3, you have effectively the bandwidth of your 2 SSDs (for small writes, not the case here) and the bandwidth of your 4 HDDs available. Given overhead, other inefficiencies and the fact that this is not a sequential write from the HDD perspective, 320MB/s isn't all that bad. Though with your setup I would have expected something faster, but NOT the theoretical 600MB/s 4 HDDs will do in sequential writes. > Stddev Bandwidth: 35.8664 > Max bandwidth (MB/sec): 392 > Min bandwidth (MB/sec): 276 > Average IOPS: 81 > Stddev IOPS: 8 > Max IOPS: 98 > Min IOPS: 69 > Average Latency(s): 0.195191 > Stddev Latency(s): 0.0830062 > Max latency(s): 0.481448 > Min latency(s): 0.0414858 > root@virt2:~# hdparm -I /dev/sda > > > > root@virt2:~# ceph osd tree > ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF > -1 72.78290 root default > -3 29.11316 host virt1 > 1 hdd 7.27829 osd.1 up 1.00000 1.00000 > 2 hdd 7.27829 osd.2 up 1.00000 1.00000 > 3 hdd 7.27829 osd.3 up 1.00000 1.00000 > 4 hdd 7.27829 osd.4 up 1.00000 1.00000 > -5 21.83487 host virt2 > 5 hdd 7.27829 osd.5 up 1.00000 1.00000 > 6 hdd 7.27829 osd.6 up 1.00000 1.00000 > 7 hdd 7.27829 osd.7 up 1.00000 1.00000 > -7 21.83487 host virt3 > 8 hdd 7.27829 osd.8 up 1.00000 1.00000 > 9 hdd 7.27829 osd.9 up 1.00000 1.00000 > 10 hdd 7.27829 osd.10 up 1.00000 1.00000 > 0 0 osd.0 down 0 1.00000 > > > root@virt2:~# ceph -s > cluster: > id: 278a2e9c-0578-428f-bd5b-3bb348923c27 > health: HEALTH_OK > > services: > mon: 3 daemons, quorum virt1,virt2,virt3 > mgr: virt1(active) > osd: 11 osds: 10 up, 10 in > > data: > pools: 1 pools, 512 pgs > objects: 6084 objects, 24105 MB > usage: 92822 MB used, 74438 GB / 74529 GB avail > pgs: 512 active+clean > > root@virt2:~# ceph -w > cluster: > id: 278a2e9c-0578-428f-bd5b-3bb348923c27 > health: HEALTH_OK > > services: > mon: 3 daemons, quorum virt1,virt2,virt3 > mgr: virt1(active) > osd: 11 osds: 10 up, 10 in > > data: > pools: 1 pools, 512 pgs > objects: 6084 objects, 24105 MB > usage: 92822 MB used, 74438 GB / 74529 GB avail > pgs: 512 active+clean > > > 2017-11-20 12:32:08.199450 mon.virt1 [INF] mon.1 10.10.10.82:6789/0 > > > > The SSD drives are used as journal drives: > Bluestore has no journals, don't confuse it and the people you're asking for help. > root@virt3:~# ceph-disk list | grep /dev/sde | grep osd > /dev/sdb1 ceph data, active, cluster ceph, osd.8, block /dev/sdb2, > block.db /dev/sde1 > root@virt3:~# ceph-disk list | grep /dev/sdf | grep osd > /dev/sdc1 ceph data, active, cluster ceph, osd.9, block /dev/sdc2, > block.db /dev/sdf1 > /dev/sdd1 ceph data, active, cluster ceph, osd.10, block /dev/sdd2, > block.db /dev/sdf2 > > > > I see now /dev/sda doesn't have a journal, though it should have. Not sure > why. If an OSD has no fast WAL/DB, it will drag the overall speed down. Verify and if so fix this and re-test. Christian > This is the command I used to create it: > > > pveceph createosd /dev/sda -bluestore 1 -journal_dev /dev/sde > > -- Christian Balzer Network/Systems Engineer chibi@xxxxxxx Rakuten Communications _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com