> -----Original Message----- > From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of > Benedikt Fraunhofer > Sent: 18 August 2015 11:25 > To: Nick Fisk <nick@xxxxxxxxxx> > Cc: ceph-users@xxxxxxxxxxxxxx > Subject: Re: How to improve single thread sequential reads? > > Hi Nick, > > did you do anything fancy to get to ~90MB/s in the first place? > I'm stuck at ~30MB/s reading cold data. single-threaded-writes are quite > speedy, around 600MB/s. I only bumped up the read ahead to 4096, apart from that I didn't change anything else. This was probably done on a reasonably quite cluster, if the cluster is doing other things sequential IO is normally the 1st to suffer. However please look for a thread I started a few months ago where I was getting very poor performance in reading data that had been sitting dormant for a while. It turned out to be something to do with taking a long time to retrieve xattrs, but unfortunately I never got to the bottom of it. I don't know if this is something you might also be experiencing? > > radosgw for cold data is around the 90MB/s, which is imho limitted by the > speed of a single disk. > > Data already present on the osd-os-buffers arrive with around 400-700MB/s > so I don't think the network is the culprit. > > (20 node cluster, 12x4TB 7.2k disks, 2 ssds for journals for 6 osds each, lacp > 2x10g bonds) > > rados bench single-threaded performs equally bad, but with its default > multithreaded settings it generates wonderful numbers, usually only limiited > by linerate and/or interrupts/s. > > I just gave kernel 4.0 with its rbd-blk-mq feature a shot, hoping to get to > "your wonderful" numbers, but it's staying below 30 MB/s. You will need this testing kernel for the blk-mq fixes, anything other than that at the moment will limit your max IO size. http://gitbuilder.ceph.com/kernel-deb-precise-x86_64-basic/ref/testing_blk-m q-plug/ > > I was thinking about using a software raid0 like you did but that's imho really > ugly. > When I know I needed something speedy, I usually just started dd-ing the > file to /dev/null and wait for about three minutes before starting the actual > job; some sort of hand-made read-ahead for dummies. > > Thx in advance > Benedikt > > > 2015-08-17 13:29 GMT+02:00 Nick Fisk <nick@xxxxxxxxxx>: > > Thanks for the replies guys. > > > > The client is set to 4MB, I haven't played with the OSD side yet as I > > wasn't sure if it would make much difference, but I will give it a go. > > If the client is already passing a 4MB request down through to the > > OSD, will it be able to readahead any further? The next 4MB object in > > theory will be on another OSD and so I'm not sure if reading ahead any > > further on the OSD side would help. > > > > How I see the problem is that the RBD client will only read 1 OSD at a > > time as the RBD readahead can't be set any higher than > > max_hw_sectors_kb, which is the object size of the RBD. Please correct me > if I'm wrong on this. > > > > If you could set the RBD readahead to much higher than the object > > size, then this would probably give the desired effect where the > > buffer could be populated by reading from several OSD's in advance to > > give much higher performance. That or wait for striping to appear in the > Kernel client. > > > > I've also found that BareOS (fork of Bacula) seems to has a direct > > RADOS feature that supports radosstriper. I might try this and see how > > it performs as well. > > > > > >> -----Original Message----- > >> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf > >> Of Somnath Roy > >> Sent: 17 August 2015 03:36 > >> To: Alex Gorbachev <ag@xxxxxxxxxxxxxxxxxxx>; Nick Fisk > >> <nick@xxxxxxxxxx> > >> Cc: ceph-users@xxxxxxxxxxxxxx > >> Subject: Re: How to improve single thread sequential reads? > >> > >> Have you tried setting read_ahead_kb to bigger number for both > >> client/OSD side if you are using krbd ? > >> In case of librbd, try the different config options for rbd cache.. > >> > >> Thanks & Regards > >> Somnath > >> > >> -----Original Message----- > >> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf > >> Of Alex Gorbachev > >> Sent: Sunday, August 16, 2015 7:07 PM > >> To: Nick Fisk > >> Cc: ceph-users@xxxxxxxxxxxxxx > >> Subject: Re: How to improve single thread sequential reads? > >> > >> Hi Nick, > >> > >> On Thu, Aug 13, 2015 at 4:37 PM, Nick Fisk <nick@xxxxxxxxxx> wrote: > >> >> -----Original Message----- > >> >> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On > >> >> Behalf Of Nick Fisk > >> >> Sent: 13 August 2015 18:04 > >> >> To: ceph-users@xxxxxxxxxxxxxx > >> >> Subject: How to improve single thread sequential reads? > >> >> > >> >> Hi, > >> >> > >> >> I'm trying to use a RBD to act as a staging area for some data > >> >> before > >> > pushing > >> >> it down to some LTO6 tapes. As I cannot use striping with the > >> >> kernel > >> > client I > >> >> tend to be maxing out at around 80MB/s reads testing with DD. Has > >> >> anyone got any clever suggestions of giving this a bit of a boost, > >> >> I think I need > >> > to get it > >> >> up to around 200MB/s to make sure there is always a steady flow of > >> >> data to the tape drive. > >> > > >> > I've just tried the testing kernel with the blk-mq fixes in it for > >> > full size IO's, this combined with bumping readahead up to 4MB, is > >> > now getting me on average 150MB/s to 200MB/s so this might suffice. > >> > > >> > On a personal interest, I would still like to know if anyone has > >> > ideas on how to really push much higher bandwidth through a RBD. > >> > >> Some settings in our ceph.conf that may help: > >> > >> osd_op_threads = 20 > >> osd_mount_options_xfs = rw,noatime,inode64,logbsize=256k > >> filestore_queue_max_ops = 90000 filestore_flusher = false > >> filestore_max_sync_interval = 10 filestore_sync_flush = false > >> > >> Regards, > >> Alex > >> > >> > > >> >> > >> >> Rbd-fuse seems to top out at 12MB/s, so there goes that option. > >> >> > >> >> I'm thinking mapping multiple RBD's and then combining them into a > >> >> mdadm > >> >> RAID0 stripe might work, but seems a bit messy. > >> >> > >> >> Any suggestions? > >> >> > >> >> Thanks, > >> >> Nick > >> >> > >> > > >> > > >> > > >> > > >> > > >> > > >> > _______________________________________________ > >> > ceph-users mailing list > >> > ceph-users@xxxxxxxxxxxxxx > >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >> _______________________________________________ > >> ceph-users mailing list > >> ceph-users@xxxxxxxxxxxxxx > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >> > >> ________________________________ > >> > >> PLEASE NOTE: The information contained in this electronic mail > >> message is intended only for the use of the designated recipient(s) > >> named above. If > > the > >> reader of this message is not the intended recipient, you are hereby > > notified > >> that you have received this message in error and that any review, > >> dissemination, distribution, or copying of this message is strictly > > prohibited. If > >> you have received this communication in error, please notify the > >> sender by telephone or e-mail (as shown above) immediately and > >> destroy any and all copies of this message in your possession > >> (whether hard copies or electronically stored copies). > >> > >> _______________________________________________ > >> ceph-users mailing list > >> ceph-users@xxxxxxxxxxxxxx > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > > > > > > _______________________________________________ > > ceph-users mailing list > > ceph-users@xxxxxxxxxxxxxx > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com