Re: expected I/O / rand 4k iops

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 04/11/2013 02:27 AM, Stefan Priebe - Profihost AG wrote:
Hello list,

is there any calculation of expected I/O available?

I've a test system running 6 hosts 4 OSDs each using SSD - i get 20.000
to 40.000 IOP/s not as much as i expected but OK right now.

Hi!

How are you running your benchmarks Stefan?

Doing reads from pagecache with RADOS bench I can do up to about 22,000 IOPs from a single host.

See the page cache section of our bobtail vs argonaut article by scrolling down a bit from here:

http://ceph.com/uncategorized/argonaut-vs-bobtail-performance-preview/#4kbradoswrite

I haven't tested putting OSDs directly on RAM disks recently, but it'd probably be a good idea to try again at some point.


If i replace the SSDs on one host with spinning disks but still using
dedicated journal on ssd (20GB / Disk/OSDK), i'm not able to get more
than 300 to 400 iop/s this seems to be pretty low.

That's probably about right. The journals really only absorb a small portion of the incoming writes for free, and then you end up having to wait on the disks behind the OSDs. If you have 4 spinning disks in the system, each one is only really capable of around 150-200 IOPs assuming typical 7200rpm units. 300-400 iops for 4 disks isn't great, but it's probably not totally unrealistic either.

So what would happen with 4 spinning disks in 1 node, but everything else on SSD? All of your outstanding operations will end up backing up on the disks in the slow node while everything else sits mostly idle. That's because there's a (configurable) maximum number of outstanding operations that you can have in flight at once. No matter how high you set that though, eventually *all* outstanding operations will back up on the slow node under a sustained workload. You can mitigate this by weighting the slow OSDs to have less data vs the others, but that's not really an ideal solution. Ceph really likes well balanced systems. It'll work on heterogeneous clusters but it's definitely not ideal.

As far as SSDs go, some folks seem to be having luck with Bcache and fastcache to improve performance of spinning disk backed OSDs. I admit haven't had time to play with them yet but it's definitely on my list!


Everything tested using 0.56.4 and Qemu RBD.

Out of curiosity, do you have RBD cache enabled? I noticed on my test setup that with 64G VM images it provide quite a bit of benefit even for small random writes.


Greets,
Stefan
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux