Hi ,
I have bundled the public NICs and added 2 more monitors ( running on 2 of the 3 OSD hosts)
This seem to improve things but still I have high latency
Also performance of the SSD pool is worse than HDD which is very confusing
SSDpool is using one Toshiba PX05SMB040Y per server ( for a total of 3 OSDs)
while HDD pool is using 2 Seagate ST600MM0006 disks per server () for a total of 6 OSDs)
Note
I have also disabled C state in the BIOS and added "intel_pstate=disable intel_idle.max_cstate=0 processor.max_cstate=0 idle=poll" to GRUB
Any hints/suggestions will be greatly appreciated
[root@osd04 ~]# ceph status
cluster:
id: 37161a51-a159-4895-a7fd-3b0d857f1b66
health: HEALTH_WARN
noscrub,nodeep-scrub flag(s) set
application not enabled on 2 pool(s)
mon osd02 is low on available space
services:
mon: 3 daemons, quorum osd01,osd02,mon01
mgr: mon01(active)
osd: 9 osds: 9 up, 9 in
flags noscrub,nodeep-scrub
tcmu-runner: 6 daemons active
data:
pools: 2 pools, 228 pgs
objects: 50384 objects, 196 GB
usage: 402 GB used, 3504 GB / 3906 GB avail
pgs: 228 active+clean
io:
client: 46061 kB/s rd, 852 B/s wr, 15 op/s rd, 0 op/s wr
[root@osd04 ~]# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-9 4.50000 root ssds
-10 1.50000 host osd01-ssd
6 hdd 1.50000 osd.6 up 1.00000 1.00000
-11 1.50000 host osd02-ssd
7 hdd 1.50000 osd.7 up 1.00000 1.00000
-12 1.50000 host osd04-ssd
8 hdd 1.50000 osd.8 up 1.00000 1.00000
-1 2.72574 root default
-3 1.09058 host osd01
0 hdd 0.54529 osd.0 up 1.00000 1.00000
4 hdd 0.54529 osd.4 up 1.00000 1.00000
-5 1.09058 host osd02
1 hdd 0.54529 osd.1 up 1.00000 1.00000
3 hdd 0.54529 osd.3 up 1.00000 1.00000
-7 0.54459 host osd04
2 hdd 0.27229 osd.2 up 1.00000 1.00000
5 hdd 0.27229 osd.5 up 1.00000 1.00000
rados bench -p ssdpool 300 -t 32 write --no-cleanup && rados bench -p ssdpool 300 -t 32 seq
Total time run: 302.058832
Total writes made: 4100
Write size: 4194304
Object size: 4194304
Bandwidth (MB/sec): 54.2941
Stddev Bandwidth: 70.3355
Max bandwidth (MB/sec): 252
Min bandwidth (MB/sec): 0
Average IOPS: 13
Stddev IOPS: 17
Max IOPS: 63
Min IOPS: 0
Average Latency(s): 2.35655
Stddev Latency(s): 4.4346
Max latency(s): 29.7027
Min latency(s): 0.045166
rados bench -p rbd 300 -t 32 write --no-cleanup && rados bench -p rbd 300 -t 32 seq
Total time run: 301.428571
Total writes made: 8753
Write size: 4194304
Object size: 4194304
Bandwidth (MB/sec): 116.154
Stddev Bandwidth: 71.5763
Max bandwidth (MB/sec): 320
Min bandwidth (MB/sec): 0
Average IOPS: 29
Stddev IOPS: 17
Max IOPS: 80
Min IOPS: 0
Average Latency(s): 1.10189
Stddev Latency(s): 1.80203
Max latency(s): 15.0715
Min latency(s): 0.0210309
[root@osd04 ~]# ethtool -k gth0
Features for gth0:
rx-checksumming: on
tx-checksumming: on
tx-checksum-ipv4: off [fixed]
tx-checksum-ip-generic: on
tx-checksum-ipv6: off [fixed]
tx-checksum-fcoe-crc: on [fixed]
tx-checksum-sctp: on
scatter-gather: on
tx-scatter-gather: on
tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: on
tx-tcp-segmentation: on
tx-tcp-ecn-segmentation: off [fixed]
tx-tcp-mangleid-segmentation: off
tx-tcp6-segmentation: on
udp-fragmentation-offload: off [fixed]
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: off
receive-hashing: on
highdma: on [fixed]
rx-vlan-filter: on
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: on [fixed]
tx-gre-segmentation: on
tx-gre-csum-segmentation: on
tx-ipxip4-segmentation: on
tx-ipxip6-segmentation: on
tx-udp_tnl-segmentation: on
tx-udp_tnl-csum-segmentation: on
tx-gso-partial: on
tx-sctp-segmentation: off [fixed]
tx-esp-segmentation: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off [fixed]
rx-all: off
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off
hw-tc-offload: off
esp-hw-offload: off [fixed]
esp-tx-csum-hw-offload: off [fixed]
On 22 January 2018 at 12:09, Steven Vacaroaia <stef97@xxxxxxxxx> wrote:
Hi David,I noticed the public interface of the server I am running the test from is heavily used so I will bond that one tooI doubt though that this explains the poor performanceThanks for your adviceStevenOn 22 January 2018 at 12:02, David Turner <drakonstein@xxxxxxxxx> wrote:I'm not speaking to anything other than your configuration."I am using 2 x 10 GB bonded ( BONDING_OPTS="mode=4 miimon=100 xmit_hash_policy=1 lacp_rate=1") for cluster and 1 x 1GB for public"It might not be a bad idea for you to forgo the public network on the 1Gb interfaces and either put everything on one network or use VLANs on the 10Gb connections. I lean more towards that in particular because your public network doesn't have a bond on it. Just as a note, communication between the OSDs and the MONs are all done on the public network. If that interface goes down, then the OSDs are likely to be marked down/out from your cluster. I'm a fan of VLANs, but if you don't have the equipment or expertise to go that route, then just using the same subnet for public and private is a decent way to go.On Mon, Jan 22, 2018 at 11:37 AM Steven Vacaroaia <stef97@xxxxxxxxx> wrote:I did test with rados bench ..here are the resultsrados bench -p ssdpool 300 -t 12 write --no-cleanup && rados bench -p ssdpool 300 -t 12 seqTotal time run: 300.322608Total writes made: 10632Write size: 4194304Object size: 4194304Bandwidth (MB/sec): 141.608Stddev Bandwidth: 74.1065Max bandwidth (MB/sec): 264Min bandwidth (MB/sec): 0Average IOPS: 35Stddev IOPS: 18Max IOPS: 66Min IOPS: 0Average Latency(s): 0.33887Stddev Latency(s): 0.701947Max latency(s): 9.80161Min latency(s): 0.015171Total time run: 300.829945Total reads made: 10070Read size: 4194304Object size: 4194304Bandwidth (MB/sec): 133.896Average IOPS: 33Stddev IOPS: 14Max IOPS: 68Min IOPS: 3Average Latency(s): 0.35791Max latency(s): 4.68213Min latency(s): 0.0107572rados bench -p scbench256 300 -t 12 write --no-cleanup && rados bench -p scbench256 300 -t 12 seqTotal time run: 300.747004Total writes made: 10239Write size: 4194304Object size: 4194304Bandwidth (MB/sec): 136.181Stddev Bandwidth: 75.5Max bandwidth (MB/sec): 272Min bandwidth (MB/sec): 0Average IOPS: 34Stddev IOPS: 18Max IOPS: 68Min IOPS: 0Average Latency(s): 0.352339Stddev Latency(s): 0.72211Max latency(s): 9.62304Min latency(s): 0.00936316hints = 1Total time run: 300.610761Total reads made: 7628Read size: 4194304Object size: 4194304Bandwidth (MB/sec): 101.5Average IOPS: 25Stddev IOPS: 11Max IOPS: 61Min IOPS: 0Average Latency(s): 0.472321Max latency(s): 15.636Min latency(s): 0.0188098On 22 January 2018 at 11:34, Steven Vacaroaia <stef97@xxxxxxxxx> wrote:sorry ..send the message too soonHere is more infoVendor Id : SEAGATEProduct Id : ST600MM0006State : OnlineDisk Type : SAS,Hard Disk DeviceCapacity : 558.375 GBPower State : Active( SSD is in slot 0)megacli -LDGetProp -Cache -LALL -a0Adapter 0-VD 0(target id: 0): Cache Policy:WriteThrough, ReadAheadNone, Direct, No Write Cache if bad BBUAdapter 0-VD 1(target id: 1): Cache Policy:WriteBack, ReadAdaptive, Direct, No Write Cache if bad BBUAdapter 0-VD 2(target id: 2): Cache Policy:WriteBack, ReadAdaptive, Direct, No Write Cache if bad BBUAdapter 0-VD 3(target id: 3): Cache Policy:WriteBack, ReadAdaptive, Direct, No Write Cache if bad BBUAdapter 0-VD 4(target id: 4): Cache Policy:WriteBack, ReadAdaptive, Direct, No Write Cache if bad BBUAdapter 0-VD 5(target id: 5): Cache Policy:WriteBack, ReadAdaptive, Direct, No Write Cache if bad BBU[root@osd01 ~]# megacli -LDGetProp -DskCache -LALL -a0Adapter 0-VD 0(target id: 0): Disk Write Cache : DisabledAdapter 0-VD 1(target id: 1): Disk Write Cache : Disk's DefaultAdapter 0-VD 2(target id: 2): Disk Write Cache : Disk's DefaultAdapter 0-VD 3(target id: 3): Disk Write Cache : Disk's DefaultAdapter 0-VD 4(target id: 4): Disk Write Cache : Disk's DefaultAdapter 0-VD 5(target id: 5): Disk Write Cache : Disk's DefaultCPUIntel(R) Xeon(R) CPU E5-2640 0 @ 2.50GHzCentos 7 kernel 3.10.0-693.11.6.el7.x86_64sysctl -pnet.ipv4.tcp_sack = 0net.core.netdev_budget = 600net.ipv4.tcp_window_scaling = 1net.core.rmem_max = 16777216net.core.wmem_max = 16777216net.core.rmem_default = 16777216net.core.wmem_default = 16777216net.core.optmem_max = 40960net.ipv4.tcp_rmem = 4096 87380 16777216net.ipv4.tcp_wmem = 4096 65536 16777216net.ipv4.tcp_syncookies = 0net.core.somaxconn = 1024net.core.netdev_max_backlog = 20000net.ipv4.tcp_max_syn_backlog = 30000net.ipv4.tcp_max_tw_buckets = 2000000net.ipv4.tcp_tw_reuse = 1net.ipv4.tcp_slow_start_after_idle = 0 net.ipv4.conf.all.send_redirects = 0 net.ipv4.conf.all.accept_redirects = 0 net.ipv4.conf.all.accept_source_route = 0 vm.min_free_kbytes = 262144vm.swappiness = 0vm.vfs_cache_pressure = 100fs.suid_dumpable = 0kernel.core_uses_pid = 1kernel.msgmax = 65536kernel.msgmnb = 65536kernel.randomize_va_space = 1kernel.sysrq = 0kernel.pid_max = 4194304fs.file-max = 100000ceph.confpublic_network = 10.10.30.0/24cluster_network = 192.168.0.0/24osd_op_num_threads_per_shard = 2osd_op_num_shards = 25osd_pool_default_size = 2osd_pool_default_min_size = 1 # Allow writing 1 copy in a degraded stateosd_pool_default_pg_num = 256osd_pool_default_pgp_num = 256osd_crush_chooseleaf_type = 1osd_scrub_load_threshold = 0.01osd_scrub_min_interval = 137438953472osd_scrub_max_interval = 137438953472osd_deep_scrub_interval = 137438953472osd_max_scrubs = 16osd_op_threads = 8osd_max_backfills = 1osd_recovery_max_active = 1osd_recovery_op_priority = 1debug_lockdep = 0/0debug_context = 0/0debug_crush = 0/0debug_buffer = 0/0debug_timer = 0/0debug_filer = 0/0debug_objecter = 0/0debug_rados = 0/0debug_rbd = 0/0debug_journaler = 0/0debug_objectcatcher = 0/0debug_client = 0/0debug_osd = 0/0debug_optracker = 0/0debug_objclass = 0/0debug_filestore = 0/0debug_journal = 0/0debug_ms = 0/0debug_monc = 0/0debug_tp = 0/0debug_auth = 0/0debug_finisher = 0/0debug_heartbeatmap = 0/0debug_perfcounter = 0/0debug_asok = 0/0debug_throttle = 0/0debug_mon = 0/0debug_paxos = 0/0debug_rgw = 0/0[mon]mon_allow_pool_delete = true[osd]osd_heartbeat_grace = 20osd_heartbeat_interval = 5bluestore_block_db_size = 16106127360bluestore_block_wal_size = 1073741824[osd.6]host = osd01osd_journal = /dev/disk/by-parttypeuuid/4fbd7e29-9d25-41b8-afd0-062c0ceff0 5d.1d58775a-5019-42ea-8149- a126f51a2501 crush_location = root=ssds host=osd01-ssd[osd.7]host = osd02osd_journal = /dev/disk/by-parttypeuuid/4fbd7e29-9d25-41b8-afd0-062c0ceff0 5d.683dc52d-5d69-4ff0-b5d9- b17056a55681 crush_location = root=ssds host=osd02-ssd[osd.8]host = osd04osd_journal = /dev/disk/by-parttypeuuid/4fbd7e29-9d25-41b8-afd0-062c0ceff0 5d.bd7c0088-b724-441e-9b88- 9457305c541d crush_location = root=ssds host=osd04-ssdOn 22 January 2018 at 11:29, Steven Vacaroaia <stef97@xxxxxxxxx> wrote:Hi David,Yes, I meant no separate partitions for WAL and DBI am using 2 x 10 GB bonded ( BONDING_OPTS="mode=4 miimon=100 xmit_hash_policy=1 lacp_rate=1") for cluster and 1 x 1GB for publicDisks areVendor Id : TOSHIBAProduct Id : PX05SMB040YState : OnlineDisk Type : SAS,Solid State DeviceCapacity : 372.0 GBOn 22 January 2018 at 11:24, David Turner <drakonstein@xxxxxxxxx> wrote:Disk models, other hardware information including CPU, network config? You say you're using Luminous, but then say journal on same device. I'm assuming you mean that you just have the bluestore OSD configured without a separate WAL or DB partition? Any more specifics you can give will be helpful.On Mon, Jan 22, 2018 at 11:20 AM Steven Vacaroaia <stef97@xxxxxxxxx> wrote:______________________________Hi,I'll appreciate if you can provide some guidance / suggestions regarding perfomance issues on a test cluster ( 3 x DELL R620, 1 Entreprise SSD, 3 x 600 GB ,Entreprise HDD, 8 cores, 64 GB RAM)I created 2 pools ( replication factor 2) one with only SSD and the other with only HDD( journal on same disk for both)The perfomance is quite similar although I was expecting to be at least 5 times betterNo issues noticed using atopWhat should I check / tune ?Many thanksStevenHDD based pool ( journal on the same disk)ceph osd pool get scbench256 allsize: 2min_size: 1crash_replay_interval: 0pg_num: 256pgp_num: 256crush_rule: replicated_rulehashpspool: truenodelete: falsenopgchange: falsenosizechange: falsewrite_fadvise_dontneed: falsenoscrub: falsenodeep-scrub: falseuse_gmt_hitset: 1auid: 0fast_read: 0rbd bench --io-type write image1 --pool=scbench256bench type write io_size 4096 io_threads 16 bytes 1073741824 pattern sequentialSEC OPS OPS/SEC BYTES/SEC1 46816 46836.46 191842139.782 90658 45339.11 185709011.803 133671 44540.80 182439126.084 177341 44340.36 181618100.145 217300 43464.04 178028704.546 259595 42555.85 174308767.05elapsed: 6 ops: 262144 ops/sec: 42694.50 bytes/sec: 174876688.23fio /home/cephuser/write_256.fiowrite-4M: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=rbd, iodepth=32fio-2.2.8Starting 1 processrbd engine: RBD version: 1.12.0Jobs: 1 (f=1): [r(1)] [100.0% done] [66284KB/0KB/0KB /s] [16.6K/0/0 iops] [eta 00m:00s]fio /home/cephuser/write_256.fiowrite-4M: (g=0): rw=write, bs=4K-4K/4K-4K/4K-4K, ioengine=rbd, iodepth=32fio-2.2.8Starting 1 processrbd engine: RBD version: 1.12.0Jobs: 1 (f=1): [W(1)] [100.0% done] [0KB/14464KB/0KB /s] [0/3616/0 iops] [eta 00m:00s]SSD based poolceph osd pool get ssdpool allsize: 2min_size: 1crash_replay_interval: 0pg_num: 128pgp_num: 128crush_rule: ssdpoolhashpspool: truenodelete: falsenopgchange: falsenosizechange: falsewrite_fadvise_dontneed: falsenoscrub: falsenodeep-scrub: falseuse_gmt_hitset: 1auid: 0fast_read: 0rbd -p ssdpool create --size 52100 image2rbd bench --io-type write image2 --pool=ssdpoolbench type write io_size 4096 io_threads 16 bytes 1073741824 pattern sequentialSEC OPS OPS/SEC BYTES/SEC1 42412 41867.57 171489557.932 78343 39180.86 160484805.883 118082 39076.48 160057256.164 155164 38683.98 158449572.385 192825 38307.59 156907885.846 230701 37716.95 154488608.16elapsed: 7 ops: 262144 ops/sec: 36862.89 bytes/sec: 150990387.29[root@osd01 ~]# fio /home/cephuser/write_256.fiowrite-4M: (g=0): rw=write, bs=4K-4K/4K-4K/4K-4K, ioengine=rbd, iodepth=32fio-2.2.8Starting 1 processrbd engine: RBD version: 1.12.0Jobs: 1 (f=1): [W(1)] [100.0% done] [0KB/20224KB/0KB /s] [0/5056/0 iops] [eta 00m:00s]fio /home/cephuser/write_256.fiowrite-4M: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=rbd, iodepth=32fio-2.2.8Starting 1 processrbd engine: RBD version: 1.12.0Jobs: 1 (f=1): [r(1)] [100.0% done] [76096KB/0KB/0KB /s] [19.3K/0/0 iops] [eta 00m:00s]_________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com