Improving write performance on ceph 17.6.2 HDDs + DB/WAL storage on nvme

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Greetings. My cluster consists of 3 nodes. Each node has 4 OSD HDDs with a capacity of 6 TB each and 1 nvme for db/wal storage. 2 * 10Gbps network assembled in bond and some parameters changed in rc.local to improve performance. They are below:

#Set network interface buffer size
ethtool -G eno1 rx 4096 tx 4096; ethtool -G eno2 rx 4096 tx 4096

#Set txqueuelen eno1, eno2, bond0, vmbr0
ip link set eno1 txqueuelen 10000
ip link set eno2 txqueuelen 10000
ip link set bond0 txqueuelen 20000
ip link set vmbr0 txqueuelen 20000
ip link set vmbr0.4040 txqueuelen 10000
ip link set vmbr0.4043 txqueuelen 10000
ip link set vmbr0.4045 txqueuelen 10000
ip link set vmbr0.4053 txqueuelen 10000

#Disable Offload
ethtool -K eno1 gso off gro off lro off tso off
ethtool -K eno2 gso off gro off lro off tso off
ethtool -K eno1 rx on tx on sg on
ethtool -K eno2 rx on tx on sg on
ethtool -L eno1 combined 8
ethtool -L eno2 combined 8
ethtool -K eno1 rxhash on
ethtool -K eno2 rxhash on
rss-ladder eno1 0
rss-ladder eno2 1

With the default settings of ceph.conf, when testing a 1 GB file with the SEQ1MQ8T1 profile inside the kvm virtual machine, I get the following write throughput: 152 MB/s
The bandwidth of each of the disks, if you do not use CEPH, is at least 200 MB/s, and I have 4 of them per node. In total, this is 800 MB/s. If we discard the overhead and abstraction levels, a loss of 25% of performance is possible, this should be +- 600 MB/s I may be wrong, but I'm purely theoretically guessing.
When testing and monitoring disk load with the nmon utility, I see write rates of 30 to 50 MB/s per node disk.
The network cannot be a bottleneck, since the nodes are empty and there is only 1 virtual machine on them that generates the load.
iperf3 from one node to another passes at least 8 Gbit/s, which in the test will completely allow to drive 4.8 Gbit/s equal to 600 MB/s

Write caching in the virtual machine is disabled.

I started changing the [osd] parameters in the ceph.conf file with these changes, I managed to increase the read speed by 2 times from 195 MB/s to 550 MB/s, but the write speed did not budge. The conversation now is not about IOPS, but about the write speed. What needs to be changed? Below is my ceph.conf

[global]
         auth_client_required = cephx
         auth_cluster_required = cephx
         auth_service_required = cephx
         cluster_network = 10.50.251.0/24
         fsid = 7d00b675-2f1e-47ff-a71c-b95d1745bc39
         mon_allow_pool_delete = true
         mon_host = 10.50.250.1 10.50.250.2 10.50.250.3
         ms_bind_ipv4 = true
         ms_bind_ipv6 = false
         osd_pool_default_min_size = 2
         osd_pool_default_size = 3
         public_network = 10.50.250.0/24

[client]
         keyring = /etc/pve/priv/$cluster.$name.keyring

#[osd]
#        bluestore_compression_mode = none
#        bluestore_cache_autotune = true
#        bluestore_cache_size = 3221225472
#        bluestore_cache_kv_ratio = 0.2
#        bluestore_cache_kv_max = 1610612736
#        bluestore_min_alloc_size_hdd = 65536
#        bluestore_max_alloc_size_hdd = 67108864
#        bluestore_min_alloc_size_ssd = 16384
#        bluestore_max_alloc_size_ssd = 33554432
#        bluestore_throttle_bytes = 268435456
#        bluestore_throttle_deferred_bytes = 1073741824
#        bluestore_rocksdb_options = write_buffer_size=268435456;max_write_buffer_number=4
#        osd_op_num_threads_per_shard = 4
#        osd_client_message_cap = 1024
#        osd_client_message_size_cap = 536870912

[mds]
         keyring = /var/lib/ceph/mds/ceph-$id/keyring

[mds.nd01]
         host = nd01
         mds_standby_for_name = pve

[mds.nd02]
         host = nd02
         mds_standby_for_name = pve

[mds.nd03]
         host = nd03
         mds_standby_for_name = pve

[mon.nd01]
         public_addr = 10.50.250.1

[mon.nd02]
         public_addr = 10.50.250.2

[mon.nd03]
         public_addr = 10.50.250.3

Perhaps I do not quite understand the philosophy, logic and functionality of CEPH? I am aware that there may be competition for resources and so on. But I see a clear 50-50 write throughput split when another VM runs a similar test. And at a speed of 150 MB / s, it is very scary that at some point there will be competition between virtual machines and the write speed may drop to very low
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux