Re: How to improve single thread sequential reads?

Nick Fisk <nick@xxxxxxxxxx> · Tue, 18 Aug 2015 11:40:28 +0100

> -----Original Message-----
> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of
> Benedikt Fraunhofer
> Sent: 18 August 2015 11:25
> To: Nick Fisk <nick@xxxxxxxxxx>
> Cc: ceph-users@xxxxxxxxxxxxxx
> Subject: Re:  How to improve single thread sequential reads?
> 
> Hi Nick,
> 
> did you do anything fancy to get to ~90MB/s in the first place?
> I'm stuck at ~30MB/s reading cold data. single-threaded-writes are quite
> speedy, around 600MB/s.

I only bumped up the read ahead to 4096, apart from that I didn't change
anything else. This was probably done on a reasonably quite cluster, if the
cluster is doing other things sequential IO is normally the 1st to suffer. 

However please look for a thread I started a few months ago where I was
getting very poor performance in reading data that had been sitting dormant
for a while. It turned out to be something to do with taking a long time to
retrieve xattrs, but unfortunately I never got to the bottom of it. I don't
know if this is something you might also be experiencing?

> 
> radosgw for cold data is around the 90MB/s, which is imho limitted by the
> speed of a single disk.
> 
> Data already present on the osd-os-buffers arrive with around 400-700MB/s
> so I don't think the network is the culprit.
> 
> (20 node cluster, 12x4TB 7.2k disks, 2 ssds for journals for 6 osds each,
lacp
> 2x10g bonds)
> 
> rados bench single-threaded performs equally bad, but with its default
> multithreaded settings it generates wonderful numbers, usually only
limiited
> by linerate and/or interrupts/s.
> 
> I just gave kernel 4.0 with its rbd-blk-mq feature a shot, hoping to get
to
> "your wonderful" numbers, but it's staying below 30 MB/s.

You will need this testing kernel for the blk-mq fixes, anything other than
that at the moment will limit your max IO size.
http://gitbuilder.ceph.com/kernel-deb-precise-x86_64-basic/ref/testing_blk-m
q-plug/

> 
> I was thinking about using a software raid0 like you did but that's imho
really
> ugly.
> When I know I needed something speedy, I usually just started dd-ing the
> file to /dev/null and wait for about  three minutes before starting the
actual
> job; some sort of hand-made read-ahead for dummies.
> 
> Thx in advance
>   Benedikt
> 
> 
> 2015-08-17 13:29 GMT+02:00 Nick Fisk <nick@xxxxxxxxxx>:
> > Thanks for the replies guys.
> >
> > The client is set to 4MB, I haven't played with the OSD side yet as I
> > wasn't sure if it would make much difference, but I will give it a go.
> > If the client is already passing a 4MB request down through to the
> > OSD, will it be able to readahead any further? The next 4MB object in
> > theory will be on another OSD and so I'm not sure if reading ahead any
> > further on the OSD side would help.
> >
> > How I see the problem is that the RBD client will only read 1 OSD at a
> > time as the RBD readahead can't be set any higher than
> > max_hw_sectors_kb, which is the object size of the RBD. Please correct
me
> if I'm wrong on this.
> >
> > If you could set the RBD readahead to much higher than the object
> > size, then this would probably give the desired effect where the
> > buffer could be populated by reading from several OSD's in advance to
> > give much higher performance. That or wait for striping to appear in the
> Kernel client.
> >
> > I've also found that BareOS (fork of Bacula) seems to has a direct
> > RADOS feature that supports radosstriper. I might try this and see how
> > it performs as well.
> >
> >
> >> -----Original Message-----
> >> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf
> >> Of Somnath Roy
> >> Sent: 17 August 2015 03:36
> >> To: Alex Gorbachev <ag@xxxxxxxxxxxxxxxxxxx>; Nick Fisk
> >> <nick@xxxxxxxxxx>
> >> Cc: ceph-users@xxxxxxxxxxxxxx
> >> Subject: Re:  How to improve single thread sequential
reads?
> >>
> >> Have you tried setting read_ahead_kb to bigger number for both
> >> client/OSD side if you are using krbd ?
> >> In case of librbd, try the different config options for rbd cache..
> >>
> >> Thanks & Regards
> >> Somnath
> >>
> >> -----Original Message-----
> >> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf
> >> Of Alex Gorbachev
> >> Sent: Sunday, August 16, 2015 7:07 PM
> >> To: Nick Fisk
> >> Cc: ceph-users@xxxxxxxxxxxxxx
> >> Subject: Re:  How to improve single thread sequential
reads?
> >>
> >> Hi Nick,
> >>
> >> On Thu, Aug 13, 2015 at 4:37 PM, Nick Fisk <nick@xxxxxxxxxx> wrote:
> >> >> -----Original Message-----
> >> >> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On
> >> >> Behalf Of Nick Fisk
> >> >> Sent: 13 August 2015 18:04
> >> >> To: ceph-users@xxxxxxxxxxxxxx
> >> >> Subject:  How to improve single thread sequential reads?
> >> >>
> >> >> Hi,
> >> >>
> >> >> I'm trying to use a RBD to act as a staging area for some data
> >> >> before
> >> > pushing
> >> >> it down to some LTO6 tapes. As I cannot use striping with the
> >> >> kernel
> >> > client I
> >> >> tend to be maxing out at around 80MB/s reads testing with DD. Has
> >> >> anyone got any clever suggestions of giving this a bit of a boost,
> >> >> I think I need
> >> > to get it
> >> >> up to around 200MB/s to make sure there is always a steady flow of
> >> >> data to the tape drive.
> >> >
> >> > I've just tried the testing kernel with the blk-mq fixes in it for
> >> > full size IO's, this combined with bumping readahead up to 4MB, is
> >> > now getting me on average 150MB/s to 200MB/s so this might suffice.
> >> >
> >> > On a personal interest, I would still like to know if anyone has
> >> > ideas on how to really push much higher bandwidth through a RBD.
> >>
> >> Some settings in our ceph.conf that may help:
> >>
> >> osd_op_threads = 20
> >> osd_mount_options_xfs = rw,noatime,inode64,logbsize=256k
> >> filestore_queue_max_ops = 90000 filestore_flusher = false
> >> filestore_max_sync_interval = 10 filestore_sync_flush = false
> >>
> >> Regards,
> >> Alex
> >>
> >> >
> >> >>
> >> >> Rbd-fuse seems to top out at 12MB/s, so there goes that option.
> >> >>
> >> >> I'm thinking mapping multiple RBD's and then combining them into a
> >> >> mdadm
> >> >> RAID0 stripe might work, but seems a bit messy.
> >> >>
> >> >> Any suggestions?
> >> >>
> >> >> Thanks,
> >> >> Nick
> >> >>
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > _______________________________________________
> >> > ceph-users mailing list
> >> > ceph-users@xxxxxxxxxxxxxx
> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >> _______________________________________________
> >> ceph-users mailing list
> >> ceph-users@xxxxxxxxxxxxxx
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>
> >> ________________________________
> >>
> >> PLEASE NOTE: The information contained in this electronic mail
> >> message is intended only for the use of the designated recipient(s)
> >> named above. If
> > the
> >> reader of this message is not the intended recipient, you are hereby
> > notified
> >> that you have received this message in error and that any review,
> >> dissemination, distribution, or copying of this message is strictly
> > prohibited. If
> >> you have received this communication in error, please notify the
> >> sender by telephone or e-mail (as shown above) immediately and
> >> destroy any and all copies of this message in your possession
> >> (whether hard copies or electronically stored copies).
> >>
> >> _______________________________________________
> >> ceph-users mailing list
> >> ceph-users@xxxxxxxxxxxxxx
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> >
> >
> >
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@xxxxxxxxxxxxxx
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com