Have you looked at your file layout? On a test cluster running 10.2.3 I created a 5GB file and then looked at the layout: # ls -l test.dat -rw-r--r-- 1 root root 5242880000 Nov 20 23:09 test.dat # getfattr -n ceph.file.layout test.dat # file: test.dat ceph.file.layout="stripe_unit=4194304 stripe_count=1 object_size=4194304 pool=cephfs_data" >From what I understand with this layout you are reading 4MB of data from 1 OSD at a time so I think you are seeing the overall speed of a single SATA drive. I do not think increasing your MON/MDS links to 10Gb will help, nor for a single file read will it help by going to SSD for the metadata. To test this, you may want to try creating 10 x 50GB files, and then read them in parallel and see if your overall throughput increases. If so, take a look at the layout parameters and see if you can change the file layout to get more parallelization. https://github.com/ceph/ceph/blob/master/doc/dev/file-striping.rst https://github.com/ceph/ceph/blob/master/doc/cephfs/file-layouts.rst Regards, Eric On Sun, Nov 20, 2016 at 3:24 AM, Mike Miller <millermike287@xxxxxxxxx> wrote: > Hi, > > reading a big file 50 GB (tried more too) > > dd if=bigfile of=/dev/zero bs=4M > > in a cluster with 112 SATA disks in 10 osd (6272 pgs, replication 3) gives > me only about *122 MB/s* read speed in single thread. Scrubbing turned off > during measurement. > > I have been searching for possible bottlenecks. The network is not the > problem, the machine running dd is connected to the cluster public network > with a 20 GBASE-T bond. osd dual network: cluster public 10 GBASE-T, private > 10 GBASE-T. > > The osd SATA disks are utilized only up until about 10% or 20%, not more > than that. CPUs on osd idle too. CPUs on mon idle, mds usage about 1.0 (1 > core is used on this 6-core machine). mon and mds connected with only 1 GbE > (I would expect some latency from that, but no bandwidth issues; in fact > network bandwidth is about 20 Mbit max). > > If I read a file with 50 GB, then clear the cache on the reading machine > (but not the osd caches), I get much better reading performance of about > *620 MB/s*. That seems logical to me as much (most) of the data is still in > the osd cache buffers. But still the read performance is not super > considered that the reading machine is connected to the cluster with a 20 > Gbit/s bond. > > How can I improve? I am not really sure, but from my understanding 2 > possible bottlenecks come to mind: > > 1) 1 GbE connection to mon / mds > > Is this the reason why reads are slow and osd disks are not hammered by read > requests and therewith fully utilized? > > 2) Move metadata to SSD > > Currently, cephfs_metadata is on the same pool as the data on the spinning > SATA disks. Is this the bottleneck? Is the move of metadata to SSD a > solution? > > Or is it both? > > Your experience and insight are highly appreciated. > > Thanks, > > Mike > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com