Re: How to improve single thread sequential reads?

Benedikt Fraunhofer <given.to.lists.ceph-users.ceph.com.toasta.001@xxxxxxxxxx> · Tue, 18 Aug 2015 12:25:21 +0200

Hi Nick,

did you do anything fancy to get to ~90MB/s in the first place?
I'm stuck at ~30MB/s reading cold data. single-threaded-writes are
quite speedy, around 600MB/s.

radosgw for cold data is around the 90MB/s, which is imho limitted by
the speed of a single disk.

Data already present on the osd-os-buffers arrive with around
400-700MB/s so I don't think the network is the culprit.

(20 node cluster, 12x4TB 7.2k disks, 2 ssds for journals for 6 osds
each, lacp 2x10g bonds)

rados bench single-threaded performs equally bad, but with its default
multithreaded settings it generates wonderful numbers, usually only
limiited by linerate and/or interrupts/s.

I just gave kernel 4.0 with its rbd-blk-mq feature a shot, hoping to
get to "your wonderful" numbers, but it's staying below 30 MB/s.

I was thinking about using a software raid0 like you did but that's
imho really ugly.
When I know I needed something speedy, I usually just started dd-ing
the file to /dev/null and wait for about  three minutes before
starting the actual job; some sort of hand-made read-ahead for
dummies.

Thx in advance
  Benedikt

2015-08-17 13:29 GMT+02:00 Nick Fisk <nick@xxxxxxxxxx>:
> Thanks for the replies guys.
>
> The client is set to 4MB, I haven't played with the OSD side yet as I wasn't
> sure if it would make much difference, but I will give it a go. If the
> client is already passing a 4MB request down through to the OSD, will it be
> able to readahead any further? The next 4MB object in theory will be on
> another OSD and so I'm not sure if reading ahead any further on the OSD side
> would help.
>
> How I see the problem is that the RBD client will only read 1 OSD at a time
> as the RBD readahead can't be set any higher than max_hw_sectors_kb, which
> is the object size of the RBD. Please correct me if I'm wrong on this.
>
> If you could set the RBD readahead to much higher than the object size, then
> this would probably give the desired effect where the buffer could be
> populated by reading from several OSD's in advance to give much higher
> performance. That or wait for striping to appear in the Kernel client.
>
> I've also found that BareOS (fork of Bacula) seems to has a direct RADOS
> feature that supports radosstriper. I might try this and see how it performs
> as well.
>
>
>> -----Original Message-----
>> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of
>> Somnath Roy
>> Sent: 17 August 2015 03:36
>> To: Alex Gorbachev <ag@xxxxxxxxxxxxxxxxxxx>; Nick Fisk <nick@xxxxxxxxxx>
>> Cc: ceph-users@xxxxxxxxxxxxxx
>> Subject: Re:  How to improve single thread sequential reads?
>>
>> Have you tried setting read_ahead_kb to bigger number for both client/OSD
>> side if you are using krbd ?
>> In case of librbd, try the different config options for rbd cache..
>>
>> Thanks & Regards
>> Somnath
>>
>> -----Original Message-----
>> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of
>> Alex Gorbachev
>> Sent: Sunday, August 16, 2015 7:07 PM
>> To: Nick Fisk
>> Cc: ceph-users@xxxxxxxxxxxxxx
>> Subject: Re:  How to improve single thread sequential reads?
>>
>> Hi Nick,
>>
>> On Thu, Aug 13, 2015 at 4:37 PM, Nick Fisk <nick@xxxxxxxxxx> wrote:
>> >> -----Original Message-----
>> >> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf
>> >> Of Nick Fisk
>> >> Sent: 13 August 2015 18:04
>> >> To: ceph-users@xxxxxxxxxxxxxx
>> >> Subject:  How to improve single thread sequential reads?
>> >>
>> >> Hi,
>> >>
>> >> I'm trying to use a RBD to act as a staging area for some data before
>> > pushing
>> >> it down to some LTO6 tapes. As I cannot use striping with the kernel
>> > client I
>> >> tend to be maxing out at around 80MB/s reads testing with DD. Has
>> >> anyone got any clever suggestions of giving this a bit of a boost, I
>> >> think I need
>> > to get it
>> >> up to around 200MB/s to make sure there is always a steady flow of
>> >> data to the tape drive.
>> >
>> > I've just tried the testing kernel with the blk-mq fixes in it for
>> > full size IO's, this combined with bumping readahead up to 4MB, is now
>> > getting me on average 150MB/s to 200MB/s so this might suffice.
>> >
>> > On a personal interest, I would still like to know if anyone has ideas
>> > on how to really push much higher bandwidth through a RBD.
>>
>> Some settings in our ceph.conf that may help:
>>
>> osd_op_threads = 20
>> osd_mount_options_xfs = rw,noatime,inode64,logbsize=256k
>> filestore_queue_max_ops = 90000 filestore_flusher = false
>> filestore_max_sync_interval = 10 filestore_sync_flush = false
>>
>> Regards,
>> Alex
>>
>> >
>> >>
>> >> Rbd-fuse seems to top out at 12MB/s, so there goes that option.
>> >>
>> >> I'm thinking mapping multiple RBD's and then combining them into a
>> >> mdadm
>> >> RAID0 stripe might work, but seems a bit messy.
>> >>
>> >> Any suggestions?
>> >>
>> >> Thanks,
>> >> Nick
>> >>
>> >
>> >
>> >
>> >
>> >
>> >
>> > _______________________________________________
>> > ceph-users mailing list
>> > ceph-users@xxxxxxxxxxxxxx
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>> ________________________________
>>
>> PLEASE NOTE: The information contained in this electronic mail message is
>> intended only for the use of the designated recipient(s) named above. If
> the
>> reader of this message is not the intended recipient, you are hereby
> notified
>> that you have received this message in error and that any review,
>> dissemination, distribution, or copying of this message is strictly
> prohibited. If
>> you have received this communication in error, please notify the sender by
>> telephone or e-mail (as shown above) immediately and destroy any and all
>> copies of this message in your possession (whether hard copies or
>> electronically stored copies).
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com