cephfs (rbd) read performance low - where is the bottleneck?

Mike Miller <millermike287@xxxxxxxxx> · Sun, 20 Nov 2016 18:24:41 +0800

Hi,

reading a big file 50 GB (tried more too)

dd if=bigfile of=/dev/zero bs=4M

in a cluster with 112 SATA disks in 10 osd (6272 pgs, replication 3) 
gives me only about *122 MB/s* read speed in single thread. Scrubbing 
turned off during measurement.

I have been searching for possible bottlenecks. The network is not the 
problem, the machine running dd is connected to the cluster public 
network with a 20 GBASE-T bond. osd dual network: cluster public 10 
GBASE-T, private 10 GBASE-T.

The osd SATA disks are utilized only up until about 10% or 20%, not more 
than that. CPUs on osd idle too. CPUs on mon idle, mds usage about 1.0 
(1 core is used on this 6-core machine). mon and mds connected with only 
1 GbE (I would expect some latency from that, but no bandwidth issues; 
in fact network bandwidth is about 20 Mbit max).

If I read a file with 50 GB, then clear the cache on the reading machine 
(but not the osd caches), I get much better reading performance of about 
*620 MB/s*. That seems logical to me as much (most) of the data is still 
in the osd cache buffers. But still the read performance is not super 
considered that the reading machine is connected to the cluster with a 
20 Gbit/s bond.

How can I improve? I am not really sure, but from my understanding 2 
possible bottlenecks come to mind:

1) 1 GbE connection to mon / mds

Is this the reason why reads are slow and osd disks are not hammered by 
read requests and therewith fully utilized?

2) Move metadata to SSD

Currently, cephfs_metadata is on the same pool as the data on the 
spinning SATA disks. Is this the bottleneck? Is the move of metadata to 
SSD a solution?

Or is it both?

Your experience and insight are highly appreciated.

Thanks,

Mike
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com