Hello, On Mon, 8 Jun 2015 18:01:28 +1200 Cameron.Scrace@xxxxxxxxxxxx wrote: > Just used the method in the link you sent me to test one of the EVO > 850s, with one job it reached a speed of around 2.5MB/s but it didn't > max out until around 32 jobs at 24MB/s: > I'm not the author of that page, nor did I verify that they did use a uniform methodology/environment, nor do I think that this test is a particular close approach to what I see Ceph doing in reality (it seems to write much larger chunks that 4KB). I'd suggest keeping numjobs to 1 and ramping up the block size to 4MB, see where you max out with that. I can reach the theoretical max speed of my SSDs (350MB/s) at 4MB blocks, but it's already at 90% with 1MB. That test does however produce interesting numbers that seem to be consistent by themselves and match what people have reporting here. I can get 22MB/s with just one fio job with the above setting (alas on filesystem, no spare raw partition right now) on a DC S3700 200GB SSD, directly conntected to an onboard Intel SATA-3 port. Now think what that means, a fio numjob is equivalent to an OSD daemon, so in this worst case 4KB scenario my journal and thus OSD would be 10 times faster than yours. Food for thought. > sudo fio --filename=/dev/sdh --direct=1 --sync=1 --rw=write --bs=4k > --numjobs=32 --iodepth=1 --runtime=60 --time_based --group_reporting > --name=journal-test > write: io=1507.4MB, bw=25723KB/s, iops=6430, runt= 60007msec > > Also tested a Micron 550 we had sitting around and it maxed out at > 2.5mb/s, both results conflict with the chart > Note that they disabled on SSD and controller caches, the former is of course messing things up where this isn't needed. I'd suggest you go and do a test install of Ceph with your HW and test that. Paying close attention to your SSD utilization with atop or iostat, etc. Christian > Regards, > > Cameron Scrace > Infrastructure Engineer > > Mobile +64 22 610 4629 > Phone +64 4 462 5085 > Email cameron.scrace@xxxxxxxxxxxx > Solnet Solutions Limited > Level 12, Solnet House > 70 The Terrace, Wellington 6011 > PO Box 397, Wellington 6140 > > www.solnet.co.nz > > > > From: Christian Balzer <chibi@xxxxxxx> > To: "ceph-users@xxxxxxxx" <ceph-users@xxxxxxxx> > Cc: Cameron.Scrace@xxxxxxxxxxxx > Date: 08/06/2015 02:40 p.m. > Subject: Re: Multiple journals and an OSD on one SSD > doable? > > > > On Mon, 8 Jun 2015 14:30:17 +1200 Cameron.Scrace@xxxxxxxxxxxx wrote: > > > Thanks for all the feedback. > > > > What makes the EVOs unusable? They should have plenty of speed but > > your link has them at 1.9MB/s, is it just the way they handle O_DIRECT > > and D_SYNC? > > > Precisely. > Read that ML thread for details. > > And once more, they also are not very endurable. > So depending on your usage pattern and Ceph (Ceph itself and the > underlying FS) write amplification their TBW/$ will be horrible, costing > you more in the end than more expensive, but an order of magnitude more > endurable DC SSDs. > > > Not sure if we will be able to spend anymore, we may just have to take > > the performance hit until we can get more money for the project. > > > You could cheap out with 200GB DC S3700s (half the price), but they will > definitely become the bottleneck at a combined max speed of about > 700MB/s, as opposed to the 400GB ones at 900MB/s combined. > > Christian > > > Thanks, > > > > Cameron Scrace > > Infrastructure Engineer > > > > Mobile +64 22 610 4629 > > Phone +64 4 462 5085 > > Email cameron.scrace@xxxxxxxxxxxx > > Solnet Solutions Limited > > Level 12, Solnet House > > 70 The Terrace, Wellington 6011 > > PO Box 397, Wellington 6140 > > > > www.solnet.co.nz > > > > > > > > From: Christian Balzer <chibi@xxxxxxx> > > To: "ceph-users@xxxxxxxx" <ceph-users@xxxxxxxx> > > Cc: Cameron.Scrace@xxxxxxxxxxxx > > Date: 08/06/2015 02:00 p.m. > > Subject: Re: Multiple journals and an OSD on one > > SSD > > > doable? > > > > > > > > > > Cameron, > > > > To offer at least some constructive advice here instead of just all > > doom and gloom, here's what I'd do: > > > > Replace the OS SSDs with 2 400GB Intel DC S3700s (or S3710s). > > They have enough BW to nearly saturate your network. > > > > Put all your journals on them (3 SSD OSD and 3 HDD OSD per). > > While that's a bad move from a failure domain perspective, your budget > > probably won't allow for anything better and those are VERY reliable > > and just as important durable SSDs. > > > > This will give you the speed your current setup is capable of, probably > > limited by the CPU when it comes to SSD pool operations. > > > > Christian > > > > On Mon, 8 Jun 2015 10:44:06 +0900 Christian Balzer wrote: > > > > > > > > Hello Cameron, > > > > > > On Mon, 8 Jun 2015 13:13:33 +1200 Cameron.Scrace@xxxxxxxxxxxx wrote: > > > > > > > Hi Christian, > > > > > > > > Yes we have purchased all our hardware, was very hard to convince > > > > management/finance to approve it, so some of the stuff we have is a > > > > bit cheap. > > > > > > > Unfortunate. Both the done deal and the cheapness. > > > > > > > We have four storage nodes each with 6 x 6TB Western Digital Red > > > > SATA Drives (WD60EFRX-68M) and 6 x 1TB Samsung EVO 850s SSDs and > > > > 2x250GB Samsung EVO 850s (for OS raid). > > > > CPUs are Intel Atom C2750 @ 2.40GHz (8 Cores) with 32 GB of RAM. > > > > We have a 10Gig Network. > > > > > > > I wish there was a nice way to say this, but it unfortunately boils > > > down to a "You're fooked". > > > > > > There have been many discussions about which SSDs are usable with > Ceph, > > > very recently as well. > > > Samsung EVOs (the non DC type for sure) are basically unusable for > > > journals. See the recent thread: > > > Possible improvements for a slow write speed (excluding independent > > > SSD journals) and: > > > > > > http://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/ > > > > > > for reference. > > > > > > I presume your intention for the 1TB SSDs is for a SSD backed pool? > > > Note that the EVOs have a pretty low (guaranteed) endurance, so aside > > > from needing journal SSDs that actually can do the job, you're > > > looking > > > at > > > wearing them out rather quickly (depending on your use case of > course). > > > > > > Now with SSD based OSDs or even HDD based OSDs with SSD journals > > > that > > CPU > > > looks a bit anemic. > > > > > > More below: > > > > The two options we are considering are: > > > > > > > > 1) Use two of the 1TB SSDs for the spinning disk journals (3 each) > > > > and > > > > > > then use the remaining 900+GB of each drive as an OSD to be part of > > > > the cache pool. > > > > > > > > 2) Put the spinning disk journals on the OS SSDs and use the 2 1TB > > > > SSDs for the cache pool. > > > > > > > Cache pools aren't all that speedy currently (research the ML > > > archives), even less so with the SSDs you have. > > > > > > Christian > > > > > > > In both cases the other 4 1TB SSDs will be part of their own tier. > > > > > > > > Thanks a lot! > > > > > > > > Cameron Scrace > > > > Infrastructure Engineer > > > > > > > > Mobile +64 22 610 4629 > > > > Phone +64 4 462 5085 > > > > Email cameron.scrace@xxxxxxxxxxxx > > > > Solnet Solutions Limited > > > > Level 12, Solnet House > > > > 70 The Terrace, Wellington 6011 > > > > PO Box 397, Wellington 6140 > > > > > > > > www.solnet.co.nz > > > > > > > > > > > > > > > > From: Christian Balzer <chibi@xxxxxxx> > > > > To: "ceph-users@xxxxxxxx" <ceph-users@xxxxxxxx> > > > > Cc: Cameron.Scrace@xxxxxxxxxxxx > > > > Date: 08/06/2015 12:18 p.m. > > > > Subject: Re: Multiple journals and an OSD on > > > > one SSD doable? > > > > > > > > > > > > > > > > > > > > Hello, > > > > > > > > > > > > On Mon, 8 Jun 2015 09:55:56 +1200 Cameron.Scrace@xxxxxxxxxxxx > > > > wrote: > > > > > > > > > The other option we were considering was putting the journals on > > > > > the OS SSDs, they are only 250GB and the rest would be for the > > > > > OS. Is that a decent option? > > > > > > > > > You'll be getting a LOT better advice if you're telling us more > > > > details. > > > > > > > > For starters, have you bought the hardware yet? > > > > Tell us about your design, how many initial storage nodes, how many > > > > HDDs/SSDs per node, what CPUs/RAM/network? > > > > > > > > What SSDs are we talking about, exact models please. > > > > (Both the sizes you mentioned do not ring a bell for DC level SSDs > > > > I'm aware of) > > > > > > > > That said, I'm using Intel DC S3700s for mixed OS and journal use > > > > with > > > > > > good > > > > results. > > > > In your average Ceph storage node, normal OS (logging mostly) > > > > activity is a > > > > minute drop in the bucket for any decent SSD, so nearly all of it's > > > > resources are available to journals. > > > > > > > > You want to match the number of journals per SSD according to the > > > > capabilities of your SSD, HDDs and network. > > > > > > > > For example 8 HDD OSDs with 2 200GB DC S3700 and a 10Gb/s network > > > > is a decent match. > > > > The two SSDs at 900MB/s would appear to be the bottleneck, but in > > > > reality I'd expect the HDDs to be it. > > > > Never mind that you'd be more likely to be IOPS than bandwidth > bound. > > > > > > > > Regards, > > > > > > > > Christian > > > > > > > > > Thanks! > > > > > > > > > > Cameron Scrace > > > > > Infrastructure Engineer > > > > > > > > > > Mobile +64 22 610 4629 > > > > > Phone +64 4 462 5085 > > > > > Email cameron.scrace@xxxxxxxxxxxx > > > > > Solnet Solutions Limited > > > > > Level 12, Solnet House > > > > > 70 The Terrace, Wellington 6011 > > > > > PO Box 397, Wellington 6140 > > > > > > > > > > www.solnet.co.nz > > > > > > > > > > > > > > > > > > > > From: Somnath Roy <Somnath.Roy@xxxxxxxxxxx> > > > > > To: "Cameron.Scrace@xxxxxxxxxxxx" > > > > > <Cameron.Scrace@xxxxxxxxxxxx>, > > > > > > > "ceph-users@xxxxxxxx" <ceph-users@xxxxxxxx> > > > > > Date: 08/06/2015 09:34 a.m. > > > > > Subject: RE: Multiple journals and an OSD on > > > > > one SSD > > > > > > > > > doable? > > > > > > > > > > > > > > > > > > > > Cameron, > > > > > Generally, it’s not a good idea. > > > > > You want to protect your SSDs used as journal.If any problem on > > > > > that disk, you will be losing all of your dependent OSDs. > > > > > I don’t think a bigger journal will gain you much performance , > > > > > so, default 5 GB journal size should be good enough. If you want > to > > > > > reduce the fault domain and want to put 3 journals on a SSD , go > > > > > for minimum size and high endurance SSDs for that. > > > > > Now, if you want to use your rest of space of 1 TB ssd, creating > > just > > > > > OSDs will not gain you much (rather may get some burst > > > > > performance). You may want to consider the following. > > > > > > > > > > 1. If your spindle OSD size is much bigger than 900 GB , you > > > > > don’t want to make all OSDs of similar sizes, cache pool could > > > > > be one of your option. But, remember, cache pool can wear out > > > > > your SSDs faster as presently I guess it is not optimizing the > > > > > extra writes. Sorry, I don’t have exact data as I am yet to test > > > > > that out. > > > > > > > > > > 2. If you want to make all the OSDs of similar sizes and you will > > > > > be able to create a substantial number of OSDs with your unused > > > > > SSDs (depends on how big the cluster is), you may want to put all > > > > > of your primary OSDs to SSD and gain significant performance > > > > > boost for read. Also, in this case, I don’t think you will be > > > > > getting any burst performance. > > > > > Thanks & Regards > > > > > Somnath > > > > > > > > > > From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On > > Behalf > > > > > Of > > > > > > > > > Cameron.Scrace@xxxxxxxxxxxx > > > > > Sent: Sunday, June 07, 2015 1:49 PM > > > > > To: ceph-users@xxxxxxxx > > > > > Subject: Multiple journals and an OSD on one SSD > > doable? > > > > > > > > > > Setting up a Ceph cluster and we want the journals for our > spinning > > > > > disks to be on SSDs but all of our SSDs are 1TB. We were planning > > > > > on putting 3 journals on each SSD, but that leaves 900+GB unused > > > > > on the drive, is it possible to use the leftover space as another > > > > > OSD or will it affect performance too much? > > > > > > > > > > Thanks, > > > > > > > > > > Cameron Scrace > > > > > Infrastructure Engineer > > > > > > > > > > Mobile +64 22 610 4629 > > > > > Phone +64 4 462 5085 > > > > > Email cameron.scrace@xxxxxxxxxxxx > > > > > Solnet Solutions Limited > > > > > Level 12, Solnet House > > > > > 70 The Terrace, Wellington 6011 > > > > > PO Box 397, Wellington 6140 > > > > > > > > > > www.solnet.co.nzAttention: This email may contain information > > > > > intended for the sole use of the original recipient. Please > respect > > > > > this when sharing or disclosing this email's contents with any > > > > > third party. If you believe you have received this email in > > > > > error, please delete it and notify the sender or > > > > > postmaster@xxxxxxxxxxxxxxxxxxxxx as soon as possible. The content > > > > > of this email does not necessarily reflect the views of Solnet > > > > > Solutions Ltd. > > > > > > > > > > > > > > > PLEASE NOTE: The information contained in this electronic mail > > > > > message is intended only for the use of the designated > recipient(s) > > > > > named above. If the reader of this message is not the intended > > > > > recipient, you are hereby notified that you have received this > > > > > message in error and that any review, dissemination, > > > > > distribution, or copying of this message is strictly prohibited. > > > > > If you have received this communication in error, please notify > > > > > the sender by telephone or e-mail (as shown above) immediately > > > > > and destroy any and all copies of this message in your > > > > > possession (whether hard copies or electronically stored copies). > > > > > > > > > > > > > > > > > > > > Attention: > > > > > This email may contain information intended for the sole use of > > > > > the original recipient. Please respect this when sharing or > > > > > disclosing this email's contents with any third party. If you > > > > > believe you have received this email in error, please delete it > > > > > and notify the sender or postmaster@xxxxxxxxxxxxxxxxxxxxx as > > > > > soon as possible. The content of this email does not necessarily > > > > > reflect the views of Solnet Solutions Ltd. > > > > > > > > > > > > > > > > > > > > > > > > > -- Christian Balzer Network/Systems Engineer chibi@xxxxxxx Global OnLine Japan/Fusion Communications http://www.gol.com/ _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com