slow read speeds from kernel rbd (Firefly 0.80.4)

sweil@xxxxxxxxxx (Sage Weil) · Wed, 23 Jul 2014 12:11:15 -0700 (PDT)

On Wed, 23 Jul 2014, Steve Anthony wrote:
> Hello,
> 
> Recently I've started seeing very slow read speeds from the rbd images I
> have mounted. After some analysis, I suspect the root cause is related
> to krbd; if I run the rados benchmark, I see read bandwith in the
> 400-600MB/s range, however if I attempt to read directly from the block
> device with dd I see speeds in the 10-30MB/s range. Both tests are
> performed on the same client, and I'm seeing the same issues on a second
> identical client. Write speeds from both clients into the images mounted
> have not decreased. The bench pool is configured identically to the rbd
> pool containing the production images (3 relicas, 2048 pgs). The OSD
> hosts contain 13x4TB with 3 60GB SSD journals; each journal is a
> separate partition on the SSD. The cluster currently consists of 100 OSDs.
> 
> # rados -p bench bench 300 write --no-cleanup
> 
> Total time run:         300.513664
> Total writes made:      15828
> Write size:             4194304
> Bandwidth (MB/sec):     210.679
> 
> Stddev Bandwidth:       22.8303
> Max bandwidth (MB/sec): 260
> Min bandwidth (MB/sec): 0
> Average Latency:        0.303724
> Stddev Latency:         0.250786
> Max latency:            2.53322
> Min latency:            0.105694
> 
> # rados -p bench bench 300 seq --no-cleanup
> Total time run:        143.286444
> Total reads made:     15828
> Read size:            4194304
> Bandwidth (MB/sec):    441.856
> 
> Average Latency:       0.14477
> Max latency:           2.30728
> Min latency:           0.049462
> 
> # rados -p bench bench 300 rand --no-cleanup
> Total time run:        300.151342
> Total reads made:     42183
> Read size:            4194304
> Bandwidth (MB/sec):    562.156
> 
> Average Latency:       0.113835
> Max latency:           1.7906
> Min latency:           0.039457
> 
> # dd if=/dev/rbd/rbd1 of=/dev/null bs=4M count=1024
> 1024+0 records in
> 1024+0 records out
> 4294967296 bytes (4.3 GB) copied, 348.555 s, 12.3 MB/s

dd is doing no readahead/prefetching here.  A more realistic comparison 
via rados bench would be to use a single 'thread':

 rados -p bench bench 300 seq --no-cleanup -t 1

What kind of numbers does that get you?

The readahead is something that the file system is normally going to be 
doing for you, so not seeing it at this layer is a problem primarily for 
people who expect to use dd as a benchmarking tool.

sage

> 
> Reading from XFS filesystem on top of mapped block device produces
> similar results, despite the same images performing an order of
> magnitude faster a few weeks ago. I can't be certain, but this timeframe
> correlates with when I upgraded from 0.79 to 0.80.1 and then to 0.80.4.
> The rbd clients, monitors, and osd hosts are all running Debian Wheezy
> with kernel 3.12. Any suggestions appreciated. Thanks!
> 
> -Steve
> 
> -- 
> Steve Anthony
> LTS HPC Support Specialist
> Lehigh University
> sma310 at lehigh.edu
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users at lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
>