Hi Nick, did you do anything fancy to get to ~90MB/s in the first place? I'm stuck at ~30MB/s reading cold data. single-threaded-writes are quite speedy, around 600MB/s. radosgw for cold data is around the 90MB/s, which is imho limitted by the speed of a single disk. Data already present on the osd-os-buffers arrive with around 400-700MB/s so I don't think the network is the culprit. (20 node cluster, 12x4TB 7.2k disks, 2 ssds for journals for 6 osds each, lacp 2x10g bonds) rados bench single-threaded performs equally bad, but with its default multithreaded settings it generates wonderful numbers, usually only limiited by linerate and/or interrupts/s. I just gave kernel 4.0 with its rbd-blk-mq feature a shot, hoping to get to "your wonderful" numbers, but it's staying below 30 MB/s. I was thinking about using a software raid0 like you did but that's imho really ugly. When I know I needed something speedy, I usually just started dd-ing the file to /dev/null and wait for about three minutes before starting the actual job; some sort of hand-made read-ahead for dummies. Thx in advance Benedikt 2015-08-17 13:29 GMT+02:00 Nick Fisk <nick@xxxxxxxxxx>: > Thanks for the replies guys. > > The client is set to 4MB, I haven't played with the OSD side yet as I wasn't > sure if it would make much difference, but I will give it a go. If the > client is already passing a 4MB request down through to the OSD, will it be > able to readahead any further? The next 4MB object in theory will be on > another OSD and so I'm not sure if reading ahead any further on the OSD side > would help. > > How I see the problem is that the RBD client will only read 1 OSD at a time > as the RBD readahead can't be set any higher than max_hw_sectors_kb, which > is the object size of the RBD. Please correct me if I'm wrong on this. > > If you could set the RBD readahead to much higher than the object size, then > this would probably give the desired effect where the buffer could be > populated by reading from several OSD's in advance to give much higher > performance. That or wait for striping to appear in the Kernel client. > > I've also found that BareOS (fork of Bacula) seems to has a direct RADOS > feature that supports radosstriper. I might try this and see how it performs > as well. > > >> -----Original Message----- >> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of >> Somnath Roy >> Sent: 17 August 2015 03:36 >> To: Alex Gorbachev <ag@xxxxxxxxxxxxxxxxxxx>; Nick Fisk <nick@xxxxxxxxxx> >> Cc: ceph-users@xxxxxxxxxxxxxx >> Subject: Re: How to improve single thread sequential reads? >> >> Have you tried setting read_ahead_kb to bigger number for both client/OSD >> side if you are using krbd ? >> In case of librbd, try the different config options for rbd cache.. >> >> Thanks & Regards >> Somnath >> >> -----Original Message----- >> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of >> Alex Gorbachev >> Sent: Sunday, August 16, 2015 7:07 PM >> To: Nick Fisk >> Cc: ceph-users@xxxxxxxxxxxxxx >> Subject: Re: How to improve single thread sequential reads? >> >> Hi Nick, >> >> On Thu, Aug 13, 2015 at 4:37 PM, Nick Fisk <nick@xxxxxxxxxx> wrote: >> >> -----Original Message----- >> >> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf >> >> Of Nick Fisk >> >> Sent: 13 August 2015 18:04 >> >> To: ceph-users@xxxxxxxxxxxxxx >> >> Subject: How to improve single thread sequential reads? >> >> >> >> Hi, >> >> >> >> I'm trying to use a RBD to act as a staging area for some data before >> > pushing >> >> it down to some LTO6 tapes. As I cannot use striping with the kernel >> > client I >> >> tend to be maxing out at around 80MB/s reads testing with DD. Has >> >> anyone got any clever suggestions of giving this a bit of a boost, I >> >> think I need >> > to get it >> >> up to around 200MB/s to make sure there is always a steady flow of >> >> data to the tape drive. >> > >> > I've just tried the testing kernel with the blk-mq fixes in it for >> > full size IO's, this combined with bumping readahead up to 4MB, is now >> > getting me on average 150MB/s to 200MB/s so this might suffice. >> > >> > On a personal interest, I would still like to know if anyone has ideas >> > on how to really push much higher bandwidth through a RBD. >> >> Some settings in our ceph.conf that may help: >> >> osd_op_threads = 20 >> osd_mount_options_xfs = rw,noatime,inode64,logbsize=256k >> filestore_queue_max_ops = 90000 filestore_flusher = false >> filestore_max_sync_interval = 10 filestore_sync_flush = false >> >> Regards, >> Alex >> >> > >> >> >> >> Rbd-fuse seems to top out at 12MB/s, so there goes that option. >> >> >> >> I'm thinking mapping multiple RBD's and then combining them into a >> >> mdadm >> >> RAID0 stripe might work, but seems a bit messy. >> >> >> >> Any suggestions? >> >> >> >> Thanks, >> >> Nick >> >> >> > >> > >> > >> > >> > >> > >> > _______________________________________________ >> > ceph-users mailing list >> > ceph-users@xxxxxxxxxxxxxx >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> _______________________________________________ >> ceph-users mailing list >> ceph-users@xxxxxxxxxxxxxx >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> ________________________________ >> >> PLEASE NOTE: The information contained in this electronic mail message is >> intended only for the use of the designated recipient(s) named above. If > the >> reader of this message is not the intended recipient, you are hereby > notified >> that you have received this message in error and that any review, >> dissemination, distribution, or copying of this message is strictly > prohibited. If >> you have received this communication in error, please notify the sender by >> telephone or e-mail (as shown above) immediately and destroy any and all >> copies of this message in your possession (whether hard copies or >> electronically stored copies). >> >> _______________________________________________ >> ceph-users mailing list >> ceph-users@xxxxxxxxxxxxxx >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com