On Fri, 2 Oct 2015 08:57:44 +0000 Javier C.A. wrote: > Christian > thank you so much for your answer. > You're right, when I say Performance, I actually mean the "classic FIO > test"..... Regarding the CPU, you meant 2Ghz per OSD and per CPU CORE, > isn't? Yes. Given mixed, typical load your CPU will be sufficient, but at 100% small IOPS it will become a bottleneck. > One last question, with a total number of 18xOSD (2TB/OSD), and a > replica factor of 2, is it really risky? This won't be a critical > cluster, but neither is a lab/test cluster, you know.... Thanks again.J > See the next mail. > > Date: Fri, 2 Oct 2015 17:16:21 +0900 > > From: chibi@xxxxxxx > > To: ceph-users@xxxxxxxxxxxxxx > > CC: magicboiz@xxxxxxxxxxx > > Subject: Re: Predict performance > > > > > > Hello, > > > > More line breaks, formatting. > > A wall of text makes people less likely to read things. > > > > On Fri, 2 Oct 2015 07:08:29 +0000 Javier C.A. wrote: > > > > > Hello > > > Before posting this message, I've been reading older posts in the > > > mailing list, but I didn't get any clear answer..... > > > > Define performance. > > Many people seem to be fascinated by the speed of sequential (more or > > less) writes and reads, while their use case would actually be better > > served by an increased small IOPS performance. > > > > >I happen to have > > > three servers available to test Ceph, and I would like to know if > > > there is any kind of "performance prediction formula". > > > > If there is such a thing (that actually works with less than a 10% > > error margin), I'm sure RedHat would like to charge you for it. ^_- > > > > >-My OSD servers are: > > > - 1 x Intel E5-2603v3 1.6Ghz (6 cores) > > Slightly underpowered, especially when it comes to small write IOPS. > > My personal formula is at least 2GHz per OSD with SSD journal. > > > > >- 32G RAM D4 > > OK, more is better (for read performance, see below). > > > > >- 10Gb ethernet network, jumbo frames enabled - > > > > Slight overkill given the rest of your setup, I guess you saw all the > > fun people keep having with jumbo frames in the ML archives. > > > > >SSOO: 2 x 500GB RAID 1 > > >- OSD (6 OSD): - 2TB 7200 SATA4 6Gbps > > >- 1 x SSD Intel SC3700 200GB for > > > journaling of all 6 OSDs. - > > This means that the most throughput you'll ever be able to write to > > those nodes is the speed of that SSD, 365MB/s, lets make that 350MB/s. > > Thus the slight overkill comment earlier. > > OTOH the HDDs get to use most of the IOPS (after discounting FS > > journals, overhead, the OSD leveldb, etc). > > So lets say slightly less than 100 IOPS per OSD. > > > > >Replication factor = 2. > > see below. > > > > >- XFS > > I find Ext4 faster, but that's me. > > > > >-MON nodes > > > will be running in other servers. With this OSD setup, how could I > > > predict the cpeh cluster performace (IOPS, R/W BW, latency...)? > > > > Of these, latency is the trickiest one, as so many things factor into > > it aside from the network. > > A test case where you're hitting basically just one OSD will look a lot > > worse than what an evenly spread out (more threads over a sufficiently > > large data set) test would. > > > > Userspace (librbd) results can/will vastly differ from kernel RBD > > clients. > > > > IOPS is a totally worthless data point w/o clearly defining what you're > > measuring how. > > Lets assume the "standard" of 4KB blocks and 32threads, random writes. > > Also lets assume a replication factor of 3, see below. > > > > Sustained sync'ed (direct=1 option in fio) IOPS with your setup will > > be in the 500 to 600 range (given a quiescent cluster). > > This of course can change dramatically with non-direct writes and > > caching (kernel page cache and/or RBD client caches). > > > > The same is true for reads, if your data set fits into the page caches > > of your storage nodes, it will be fast, if everything needs to be read > > from the HDDs, you're back to what these devices can do (~100 IOPS per > > HDD). > > > > To give you a concrete example, on my test cluster I have 5 nodes, 4 > > HDDs/OSDs each and no journal SSDs. > > So that's in theory 100 IOPS per HDD, divided by 2 for the on-disk > > journal, divided by 3 for replication: > > 20*100/2/3=333 > > Which amazingly is what I get with rados bench and 4K blocks, fio from > > a kernel client and direct I/O is around 200. > > > > BW, as in throughput is easier, about 350MB/s max for sustained > > sequential writes (the limit of the journal SSD) and lets say 750MB/s > > for sustained reads. > > Again, if you're reading just 8GB in your tests and that fits nicely in > > the page caches of the OSDs, it will be wire speed. > > > > >Should I configure a replica factor of 3? > > > > > If you value your data, which you will on a production server, then > > yes. This will of course cost you 1/3 of your performance compared to > > replica 2. > > > > Regards, > > > > Christian > > -- > > Christian Balzer Network/Systems Engineer > > chibi@xxxxxxx Global OnLine Japan/Fusion Communications > > http://www.gol.com/ > -- Christian Balzer Network/Systems Engineer chibi@xxxxxxx Global OnLine Japan/Fusion Communications http://www.gol.com/ _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com