Re: How to improve single thread sequential reads?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 18-08-15 12:25, Benedikt Fraunhofer wrote:
> Hi Nick,
> 
> did you do anything fancy to get to ~90MB/s in the first place?
> I'm stuck at ~30MB/s reading cold data. single-threaded-writes are
> quite speedy, around 600MB/s.
> 
> radosgw for cold data is around the 90MB/s, which is imho limitted by
> the speed of a single disk.
> 
> Data already present on the osd-os-buffers arrive with around
> 400-700MB/s so I don't think the network is the culprit.
> 
> (20 node cluster, 12x4TB 7.2k disks, 2 ssds for journals for 6 osds
> each, lacp 2x10g bonds)
> 
> rados bench single-threaded performs equally bad, but with its default
> multithreaded settings it generates wonderful numbers, usually only
> limiited by linerate and/or interrupts/s.
> 
> I just gave kernel 4.0 with its rbd-blk-mq feature a shot, hoping to
> get to "your wonderful" numbers, but it's staying below 30 MB/s.
> 
> I was thinking about using a software raid0 like you did but that's
> imho really ugly.
> When I know I needed something speedy, I usually just started dd-ing
> the file to /dev/null and wait for about  three minutes before
> starting the actual job; some sort of hand-made read-ahead for
> dummies.
> 

It really depends on your situation, but you could also go for larger
objects then 4MB for specific block devices.

In a use-case with a customer where they read large single-thread files
from RBD block devices we went for 64MB objects.

That improved our read performance in that case. We didn't have to
create a new TCP connection every 4MB and talk to a new OSD.

You could try that and see how it works out.

Wido

> Thx in advance
>   Benedikt
> 
> 
> 2015-08-17 13:29 GMT+02:00 Nick Fisk <nick@xxxxxxxxxx>:
>> Thanks for the replies guys.
>>
>> The client is set to 4MB, I haven't played with the OSD side yet as I wasn't
>> sure if it would make much difference, but I will give it a go. If the
>> client is already passing a 4MB request down through to the OSD, will it be
>> able to readahead any further? The next 4MB object in theory will be on
>> another OSD and so I'm not sure if reading ahead any further on the OSD side
>> would help.
>>
>> How I see the problem is that the RBD client will only read 1 OSD at a time
>> as the RBD readahead can't be set any higher than max_hw_sectors_kb, which
>> is the object size of the RBD. Please correct me if I'm wrong on this.
>>
>> If you could set the RBD readahead to much higher than the object size, then
>> this would probably give the desired effect where the buffer could be
>> populated by reading from several OSD's in advance to give much higher
>> performance. That or wait for striping to appear in the Kernel client.
>>
>> I've also found that BareOS (fork of Bacula) seems to has a direct RADOS
>> feature that supports radosstriper. I might try this and see how it performs
>> as well.
>>
>>
>>> -----Original Message-----
>>> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of
>>> Somnath Roy
>>> Sent: 17 August 2015 03:36
>>> To: Alex Gorbachev <ag@xxxxxxxxxxxxxxxxxxx>; Nick Fisk <nick@xxxxxxxxxx>
>>> Cc: ceph-users@xxxxxxxxxxxxxx
>>> Subject: Re:  How to improve single thread sequential reads?
>>>
>>> Have you tried setting read_ahead_kb to bigger number for both client/OSD
>>> side if you are using krbd ?
>>> In case of librbd, try the different config options for rbd cache..
>>>
>>> Thanks & Regards
>>> Somnath
>>>
>>> -----Original Message-----
>>> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of
>>> Alex Gorbachev
>>> Sent: Sunday, August 16, 2015 7:07 PM
>>> To: Nick Fisk
>>> Cc: ceph-users@xxxxxxxxxxxxxx
>>> Subject: Re:  How to improve single thread sequential reads?
>>>
>>> Hi Nick,
>>>
>>> On Thu, Aug 13, 2015 at 4:37 PM, Nick Fisk <nick@xxxxxxxxxx> wrote:
>>>>> -----Original Message-----
>>>>> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf
>>>>> Of Nick Fisk
>>>>> Sent: 13 August 2015 18:04
>>>>> To: ceph-users@xxxxxxxxxxxxxx
>>>>> Subject:  How to improve single thread sequential reads?
>>>>>
>>>>> Hi,
>>>>>
>>>>> I'm trying to use a RBD to act as a staging area for some data before
>>>> pushing
>>>>> it down to some LTO6 tapes. As I cannot use striping with the kernel
>>>> client I
>>>>> tend to be maxing out at around 80MB/s reads testing with DD. Has
>>>>> anyone got any clever suggestions of giving this a bit of a boost, I
>>>>> think I need
>>>> to get it
>>>>> up to around 200MB/s to make sure there is always a steady flow of
>>>>> data to the tape drive.
>>>>
>>>> I've just tried the testing kernel with the blk-mq fixes in it for
>>>> full size IO's, this combined with bumping readahead up to 4MB, is now
>>>> getting me on average 150MB/s to 200MB/s so this might suffice.
>>>>
>>>> On a personal interest, I would still like to know if anyone has ideas
>>>> on how to really push much higher bandwidth through a RBD.
>>>
>>> Some settings in our ceph.conf that may help:
>>>
>>> osd_op_threads = 20
>>> osd_mount_options_xfs = rw,noatime,inode64,logbsize=256k
>>> filestore_queue_max_ops = 90000 filestore_flusher = false
>>> filestore_max_sync_interval = 10 filestore_sync_flush = false
>>>
>>> Regards,
>>> Alex
>>>
>>>>
>>>>>
>>>>> Rbd-fuse seems to top out at 12MB/s, so there goes that option.
>>>>>
>>>>> I'm thinking mapping multiple RBD's and then combining them into a
>>>>> mdadm
>>>>> RAID0 stripe might work, but seems a bit messy.
>>>>>
>>>>> Any suggestions?
>>>>>
>>>>> Thanks,
>>>>> Nick
>>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> ceph-users mailing list
>>>> ceph-users@xxxxxxxxxxxxxx
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@xxxxxxxxxxxxxx
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>> ________________________________
>>>
>>> PLEASE NOTE: The information contained in this electronic mail message is
>>> intended only for the use of the designated recipient(s) named above. If
>> the
>>> reader of this message is not the intended recipient, you are hereby
>> notified
>>> that you have received this message in error and that any review,
>>> dissemination, distribution, or copying of this message is strictly
>> prohibited. If
>>> you have received this communication in error, please notify the sender by
>>> telephone or e-mail (as shown above) immediately and destroy any and all
>>> copies of this message in your possession (whether hard copies or
>>> electronically stored copies).
>>>
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@xxxxxxxxxxxxxx
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>>
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux