slow read speeds from kernel rbd (Firefly 0.80.4)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Ah, ok. That makes sense. With one concurrent operation I see numbers
more in line with the read speeds I'm seeing from the filesystems on the
rbd images.

# rados -p bench bench 300 seq --no-cleanup -t 1
Total time run:        300.114589
Total reads made:     2795
Read size:            4194304
Bandwidth (MB/sec):    37.252

Average Latency:       0.10737
Max latency:           0.968115
Min latency:           0.039754

# rados -p bench bench 300 rand --no-cleanup -t 1
Total time run:        300.164208
Total reads made:     2996
Read size:            4194304
Bandwidth (MB/sec):    39.925

Average Latency:       0.100183
Max latency:           1.04772
Min latency:           0.039584

I really wish I could find my data on read speeds from a couple weeks
ago. It's possible that they've always been in this range, but I
remember one of my test users saturating his 1GbE link over NFS reading
copying from the rbd client to his workstation. Of course, it's also
possible that the data set he was using was cached in RAM when he was
testing, masking the lower rbd speeds.

It just seems counterintuitive to me that read speeds would be so much
slower that writes at the filesystem layer in practice. With images in
the 10-100TB range, reading data at 20-60MB/s isn't going to be
pleasant. Can you suggest any tunables or other approaches to
investigate to improve these speeds, or are they in line with what you'd
expect? Thanks for your help!

-Steve

On 07/23/2014 03:11 PM, Sage Weil wrote:
> On Wed, 23 Jul 2014, Steve Anthony wrote:
>   
>> Hello,
>>
>> Recently I've started seeing very slow read speeds from the rbd images I
>> have mounted. After some analysis, I suspect the root cause is related
>> to krbd; if I run the rados benchmark, I see read bandwith in the
>> 400-600MB/s range, however if I attempt to read directly from the block
>> device with dd I see speeds in the 10-30MB/s range. Both tests are
>> performed on the same client, and I'm seeing the same issues on a second
>> identical client. Write speeds from both clients into the images mounted
>> have not decreased. The bench pool is configured identically to the rbd
>> pool containing the production images (3 relicas, 2048 pgs). The OSD
>> hosts contain 13x4TB with 3 60GB SSD journals; each journal is a
>> separate partition on the SSD. The cluster currently consists of 100 OSDs.
>>
>> # rados -p bench bench 300 write --no-cleanup
>>
>> Total time run:         300.513664
>> Total writes made:      15828
>> Write size:             4194304
>> Bandwidth (MB/sec):     210.679
>>
>> Stddev Bandwidth:       22.8303
>> Max bandwidth (MB/sec): 260
>> Min bandwidth (MB/sec): 0
>> Average Latency:        0.303724
>> Stddev Latency:         0.250786
>> Max latency:            2.53322
>> Min latency:            0.105694
>>
>> # rados -p bench bench 300 seq --no-cleanup
>> Total time run:        143.286444
>> Total reads made:     15828
>> Read size:            4194304
>> Bandwidth (MB/sec):    441.856
>>
>> Average Latency:       0.14477
>> Max latency:           2.30728
>> Min latency:           0.049462
>>
>> # rados -p bench bench 300 rand --no-cleanup
>> Total time run:        300.151342
>> Total reads made:     42183
>> Read size:            4194304
>> Bandwidth (MB/sec):    562.156
>>
>> Average Latency:       0.113835
>> Max latency:           1.7906
>> Min latency:           0.039457
>>
>> # dd if=/dev/rbd/rbd1 of=/dev/null bs=4M count=1024
>> 1024+0 records in
>> 1024+0 records out
>> 4294967296 bytes (4.3 GB) copied, 348.555 s, 12.3 MB/s
>>     
> dd is doing no readahead/prefetching here.  A more realistic comparison 
> via rados bench would be to use a single 'thread':
>
>  rados -p bench bench 300 seq --no-cleanup -t 1
>
> What kind of numbers does that get you?
>
> The readahead is something that the file system is normally going to be 
> doing for you, so not seeing it at this layer is a problem primarily for 
> people who expect to use dd as a benchmarking tool.
>
> sage
>
>
>   
>> Reading from XFS filesystem on top of mapped block device produces
>> similar results, despite the same images performing an order of
>> magnitude faster a few weeks ago. I can't be certain, but this timeframe
>> correlates with when I upgraded from 0.79 to 0.80.1 and then to 0.80.4.
>> The rbd clients, monitors, and osd hosts are all running Debian Wheezy
>> with kernel 3.12. Any suggestions appreciated. Thanks!
>>
>> -Steve
>>
>> -- 
>> Steve Anthony
>> LTS HPC Support Specialist
>> Lehigh University
>> sma310 at lehigh.edu
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users at lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>>     

-- 
Steve Anthony
LTS HPC Support Specialist
Lehigh University
sma310 at lehigh.edu



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux