Re: Predict performance

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Christian

thank you so much for your answer.

You're right, when I say Performance, I actually mean the "classic FIO test".....

Regarding the CPU, you meant 2Ghz per OSD and per CPU CORE, isn't?

One last question, with a total number of 18xOSD (2TB/OSD), and a replica factor of 2, is it really risky? This won't be a critical cluster, but neither is a lab/test cluster, you know....

Thanks again.
J

> Date: Fri, 2 Oct 2015 17:16:21 +0900
> From: chibi@xxxxxxx
> To: ceph-users@xxxxxxxxxxxxxx
> CC: magicboiz@xxxxxxxxxxx
> Subject: Re: Predict performance
>
>
> Hello,
>
> More line breaks, formatting.
> A wall of text makes people less likely to read things.
>
> On Fri, 2 Oct 2015 07:08:29 +0000 Javier C.A. wrote:
>
> > Hello
> > Before posting this message, I've been reading older posts in the
> > mailing list, but I didn't get any clear answer.....
>
> Define performance.
> Many people seem to be fascinated by the speed of sequential (more or less)
> writes and reads, while their use case would actually be better served by
> an increased small IOPS performance.
>
> >I happen to have
> > three servers available to test Ceph, and I would like to know if there
> > is any kind of "performance prediction formula".
>
> If there is such a thing (that actually works with less than a 10% error
> margin), I'm sure RedHat would like to charge you for it. ^_-
>
> >-My OSD servers are:
> > - 1 x Intel E5-2603v3 1.6Ghz (6 cores)
> Slightly underpowered, especially when it comes to small write IOPS.
> My personal formula is at least 2GHz per OSD with SSD journal.
>
> >- 32G RAM D4
> OK, more is better (for read performance, see below).
>
> >- 10Gb ethernet network, jumbo frames enabled -
>
> Slight overkill given the rest of your setup, I guess you saw all the fun
> people keep having with jumbo frames in the ML archives.
>
> >SSOO: 2 x 500GB RAID 1
> >- OSD (6 OSD): - 2TB 7200 SATA4 6Gbps
> >- 1 x SSD Intel SC3700 200GB for
> > journaling of all 6 OSDs. -
> This means that the most throughput you'll ever be able to write to those
> nodes is the speed of that SSD, 365MB/s, lets make that 350MB/s.
> Thus the slight overkill comment earlier.
> OTOH the HDDs get to use most of the IOPS (after discounting FS journals,
> overhead, the OSD leveldb, etc).
> So lets say slightly less than 100 IOPS per OSD.
>
> >Replication factor = 2.
> see below.
>
> >- XFS
> I find Ext4 faster, but that's me.
>
> >-MON nodes
> > will be running in other servers. With this OSD setup, how could I
> > predict the cpeh cluster performace (IOPS, R/W BW, latency...)?
>
> Of these, latency is the trickiest one, as so many things factor into it
> aside from the network.
> A test case where you're hitting basically just one OSD will look a lot
> worse than what an evenly spread out (more threads over a sufficiently
> large data set) test would.
>
> Userspace (librbd) results can/will vastly differ from kernel RBD clients.
>
> IOPS is a totally worthless data point w/o clearly defining what you're
> measuring how.
> Lets assume the "standard" of 4KB blocks and 32threads, random writes.
> Also lets assume a replication factor of 3, see below.
>
> Sustained sync'ed (direct=1 option in fio) IOPS with your setup will be in
> the 500 to 600 range (given a quiescent cluster).
> This of course can change dramatically with non-direct writes and caching
> (kernel page cache and/or RBD client caches).
>
> The same is true for reads, if your data set fits into the page caches of
> your storage nodes, it will be fast, if everything needs to be read from
> the HDDs, you're back to what these devices can do (~100 IOPS per HDD).
>
> To give you a concrete example, on my test cluster I have 5 nodes, 4
> HDDs/OSDs each and no journal SSDs.
> So that's in theory 100 IOPS per HDD, divided by 2 for the on-disk journal,
> divided by 3 for replication:
> 20*100/2/3=333
> Which amazingly is what I get with rados bench and 4K blocks, fio from a
> kernel client and direct I/O is around 200.
>
> BW, as in throughput is easier, about 350MB/s max for sustained sequential
> writes (the limit of the journal SSD) and lets say 750MB/s for sustained
> reads.
> Again, if you're reading just 8GB in your tests and that fits nicely in
> the page caches of the OSDs, it will be wire speed.
>
> >Should I configure a replica factor of 3?
> >
> If you value your data, which you will on a production server, then yes.
> This will of course cost you 1/3 of your performance compared to replica 2.
>
> Regards,
>
> Christian
> --
> Christian Balzer Network/Systems Engineer
> chibi@xxxxxxx Global OnLine Japan/Fusion Communications
> http://www.gol.com/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux