Re: Giant to Jewel poor read performance with Rados bench

David <dclistslinux@xxxxxxxxx> · Sun, 7 Aug 2016 15:19:39 +0100

I created a new pool that only contains OSDs on a single node. The Rados bench gives me the speed I'd expect (1GB/s...all coming out of cache) 

I then created a pool that contains OSDs from 2 nodes. Now the strange part is, if I run the Rados bench from either of those nodes, I get the speed I'd expect: 2GB/s (1GB local and 1GB coming over from the other node). If I run the same bench from a 3rd node, I only get about 200MB/s. During that bench, I monitor the interfaces on the 2 OSD nodes and they are not going any faster than 1Gb/s. It's almost as if the speed has negotiated down to 1Gb. If I run iperf tests between the 3 nodes I'm getting the full 10Gb speed.

'rados -p 2node bench 60 rand --no-cleanup' from one of the nodes in the 2 node pool:

Total time run:       60.036413
Total reads made:     33496
Read size:            4194304
Object size:          4194304
Bandwidth (MB/sec):   2231.71
Average IOPS:         557
Stddev IOPS:          10
Max IOPS:             584
Min IOPS:             535
Average Latency(s):   0.0275722
Max latency(s):       0.164382
Min latency(s):       0.00480053

'rados -p 2node bench 60 rand --no-cleanup' from a node not in the 3 node pool:

Total time run:       60.383206
Total reads made:     2715
Read size:            4194304
Object size:          4194304
Bandwidth (MB/sec):   179.851
Average IOPS:         44
Stddev IOPS:          10
Max IOPS:             77
Min IOPS:             28
Average Latency(s):   0.355126
Max latency(s):       2.17366
Min latency(s):       0.00641849

I appreciate this may not be a Ceph config issue but any tips on tracking down this issue would be much appreciated.

On Sat, Aug 6, 2016 at 9:38 PM, David <dclistslinux@xxxxxxxxx> wrote:
Hi All
I've just installed Jewel 10.2.2 on hardware that has previously been running Giant. Rados Bench with the default rand and seq tests is giving me approx 40% of the throughput I used to achieve. On Giant I would get ~1000MB/s (so probably limited by the 10GbE interface), now I'm getting 300 - 400MB/s.

I can see there is no activity on the disks during the bench so the data is all coming out of cache. The cluster isn't doing anything else during the test. I'm fairly sure my network is sound, I've done the usual testing with iperf etc. The write test seems about the same as I used to get (~400MB/s). 

This was a fresh install rather than an upgrade.

Are there any gotchas I should be aware of?

Some more details:

OS: CentOS 7
Kernel: 3.10.0-327.28.2.el7.x86_64
5 nodes (each 10 * 4TB SATA, 2 * Intel dc3700 SSD partitioned up for journals).
10GbE public network
10GbE cluster network
MTU 9000 on all interfaces and switch
Ceph installed from ceph repo

Ceph.conf is pretty basic (IPs, hosts etc omitted):

filestore_xattr_use_omap = true
osd_journal_size = 10000
osd_pool_default_size = 3
osd_pool_default_min_size = 2
osd_pool_default_pg_num = 4096
osd_pool_default_pgp_num = 4096
osd_crush_chooseleaf_type = 1
max_open_files = 131072
mon_clock_drift_allowed = .15
mon_clock_drift_warn_backoff = 30
mon_osd_down_out_interval = 300
mon_osd_report_timeout = 300
mon_osd_full_ratio = .95
mon_osd_nearfull_ratio = .80
osd_backfill_full_ratio = .80

Thanks
David

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com