Re: Multiple journals and an OSD on one SSD doable?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,

On Mon, 8 Jun 2015 18:01:28 +1200 Cameron.Scrace@xxxxxxxxxxxx wrote:

> Just used the method in the link you sent me to test one of the EVO
> 850s, with one job it reached a speed of around 2.5MB/s but it didn't
> max out until around 32 jobs at 24MB/s: 
> 
I'm not the author of that page, nor did I verify that they did use a
uniform methodology/environment, nor do I think that this test is a
particular close approach to what I see Ceph doing in reality (it seems
to write much larger chunks that 4KB).
I'd suggest keeping numjobs to 1 and ramping up the block size to 4MB, see
where you max out with that. 
I can reach the theoretical max speed of my SSDs (350MB/s) at 4MB blocks,
but it's already at 90% with 1MB.

That test does however produce interesting numbers that seem to be
consistent by themselves and match what people have reporting here.

I can get 22MB/s with just one fio job with the above setting (alas on
filesystem, no spare raw partition right now) on a DC S3700 200GB SSD,
directly conntected to an onboard Intel SATA-3 port.

Now think what that means, a fio numjob is equivalent to an OSD daemon, so
in this worst case 4KB scenario my journal and thus OSD would be 10 times
faster than yours.
Food for thought.

> sudo fio --filename=/dev/sdh --direct=1 --sync=1 --rw=write --bs=4k 
> --numjobs=32 --iodepth=1 --runtime=60 --time_based --group_reporting 
> --name=journal-test
> write: io=1507.4MB, bw=25723KB/s, iops=6430, runt= 60007msec
> 
> Also tested a Micron 550 we had sitting around and it maxed out at 
> 2.5mb/s, both results conflict with the chart
> 
Note that they disabled on SSD and controller caches, the former is of
course messing things up where this isn't needed.

I'd suggest you go and do a test install of Ceph with your HW and test
that.
Paying close attention to your SSD utilization with atop or iostat, etc.

Christian

> Regards,
> 
> Cameron Scrace
> Infrastructure Engineer
> 
> Mobile +64 22 610 4629
> Phone  +64 4 462 5085 
> Email  cameron.scrace@xxxxxxxxxxxx
> Solnet Solutions Limited
> Level 12, Solnet House
> 70 The Terrace, Wellington 6011
> PO Box 397, Wellington 6140
> 
> www.solnet.co.nz
> 
> 
> 
> From:   Christian Balzer <chibi@xxxxxxx>
> To:     "ceph-users@xxxxxxxx" <ceph-users@xxxxxxxx>
> Cc:     Cameron.Scrace@xxxxxxxxxxxx
> Date:   08/06/2015 02:40 p.m.
> Subject:        Re:  Multiple journals and an OSD on one SSD 
> doable?
> 
> 
> 
> On Mon, 8 Jun 2015 14:30:17 +1200 Cameron.Scrace@xxxxxxxxxxxx wrote:
> 
> > Thanks for all the feedback. 
> > 
> > What makes the EVOs unusable? They should have plenty of speed but
> > your link has them at 1.9MB/s, is it just the way they handle O_DIRECT
> > and D_SYNC? 
> > 
> Precisely. 
> Read that ML thread for details.
> 
> And once more, they also are not very endurable.
> So depending on your usage pattern and Ceph (Ceph itself and the
> underlying FS) write amplification their TBW/$ will be horrible, costing
> you more in the end than more expensive, but an order of magnitude more
> endurable DC SSDs. 
> 
> > Not sure if we will be able to spend anymore, we may just have to take
> > the performance hit until we can get more money for the project.
> >
> You could cheap out with 200GB DC S3700s (half the price), but they will
> definitely become the bottleneck at a combined max speed of about
> 700MB/s, as opposed to the 400GB ones at 900MB/s combined.
>  
> Christian
> 
> > Thanks,
> > 
> > Cameron Scrace
> > Infrastructure Engineer
> > 
> > Mobile +64 22 610 4629
> > Phone  +64 4 462 5085 
> > Email  cameron.scrace@xxxxxxxxxxxx
> > Solnet Solutions Limited
> > Level 12, Solnet House
> > 70 The Terrace, Wellington 6011
> > PO Box 397, Wellington 6140
> > 
> > www.solnet.co.nz
> > 
> > 
> > 
> > From:   Christian Balzer <chibi@xxxxxxx>
> > To:     "ceph-users@xxxxxxxx" <ceph-users@xxxxxxxx>
> > Cc:     Cameron.Scrace@xxxxxxxxxxxx
> > Date:   08/06/2015 02:00 p.m.
> > Subject:        Re:  Multiple journals and an OSD on one
> > SSD 
> 
> > doable?
> > 
> > 
> > 
> > 
> > Cameron,
> > 
> > To offer at least some constructive advice here instead of just all
> > doom and gloom, here's what I'd do:
> > 
> > Replace the OS SSDs with 2 400GB Intel DC S3700s (or S3710s).
> > They have enough BW to nearly saturate your network.
> > 
> > Put all your journals on them (3 SSD OSD and 3 HDD OSD per). 
> > While that's a bad move from a failure domain perspective, your budget
> > probably won't allow for anything better and those are VERY reliable
> > and just as important durable SSDs. 
> > 
> > This will give you the speed your current setup is capable of, probably
> > limited by the CPU when it comes to SSD pool operations.
> > 
> > Christian
> > 
> > On Mon, 8 Jun 2015 10:44:06 +0900 Christian Balzer wrote:
> > 
> > > 
> > > Hello Cameron,
> > > 
> > > On Mon, 8 Jun 2015 13:13:33 +1200 Cameron.Scrace@xxxxxxxxxxxx wrote:
> > > 
> > > > Hi Christian,
> > > > 
> > > > Yes we have purchased all our hardware, was very hard to convince 
> > > > management/finance to approve it, so some of the stuff we have is a
> > > > bit cheap.
> > > > 
> > > Unfortunate. Both the done deal and the cheapness. 
> > > 
> > > > We have four storage nodes each with 6 x 6TB Western Digital Red
> > > > SATA Drives (WD60EFRX-68M) and 6 x 1TB Samsung EVO 850s SSDs and
> > > > 2x250GB Samsung EVO 850s (for OS raid).
> > > > CPUs are Intel Atom C2750  @ 2.40GHz (8 Cores) with 32 GB of RAM. 
> > > > We have a 10Gig Network.
> > > >
> > > I wish there was a nice way to say this, but it unfortunately boils
> > > down to a "You're fooked".
> > > 
> > > There have been many discussions about which SSDs are usable with 
> Ceph,
> > > very recently as well.
> > > Samsung EVOs (the non DC type for sure) are basically unusable for
> > > journals. See the recent thread:
> > >  Possible improvements for a slow write speed (excluding independent
> > > SSD journals) and:
> > > 
> > 
> http://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/
> 
> > 
> > > for reference.
> > > 
> > > I presume your intention for the 1TB SSDs is for a SSD backed pool? 
> > > Note that the EVOs have a pretty low (guaranteed) endurance, so aside
> > > from needing journal SSDs that actually can do the job, you're
> > > looking 
> 
> > at
> > > wearing them out rather quickly (depending on your use case of 
> course).
> > > 
> > > Now with SSD based OSDs or even HDD based OSDs with SSD journals
> > > that 
> > CPU
> > > looks a bit anemic.
> > > 
> > > More below:
> > > > The two options we are considering are:
> > > > 
> > > > 1) Use two of the 1TB SSDs for the spinning disk journals (3 each)
> > > > and 
> > 
> > > > then use the remaining 900+GB of each drive as an OSD to be part of
> > > > the cache pool.
> > > > 
> > > > 2) Put the spinning disk journals on the OS SSDs and use the 2 1TB
> > > > SSDs for the cache pool.
> > > > 
> > > Cache pools aren't all that speedy currently (research the ML
> > > archives), even less so with the SSDs you have.
> > > 
> > > Christian
> > > 
> > > > In both cases the other 4 1TB SSDs will be part of their own tier.
> > > > 
> > > > Thanks a lot!
> > > > 
> > > > Cameron Scrace
> > > > Infrastructure Engineer
> > > > 
> > > > Mobile +64 22 610 4629
> > > > Phone  +64 4 462 5085 
> > > > Email  cameron.scrace@xxxxxxxxxxxx
> > > > Solnet Solutions Limited
> > > > Level 12, Solnet House
> > > > 70 The Terrace, Wellington 6011
> > > > PO Box 397, Wellington 6140
> > > > 
> > > > www.solnet.co.nz
> > > > 
> > > > 
> > > > 
> > > > From:   Christian Balzer <chibi@xxxxxxx>
> > > > To:     "ceph-users@xxxxxxxx" <ceph-users@xxxxxxxx>
> > > > Cc:     Cameron.Scrace@xxxxxxxxxxxx
> > > > Date:   08/06/2015 12:18 p.m.
> > > > Subject:        Re:  Multiple journals and an OSD on
> > > > one SSD doable?
> > > > 
> > > > 
> > > > 
> > > > 
> > > > Hello,
> > > > 
> > > > 
> > > > On Mon, 8 Jun 2015 09:55:56 +1200 Cameron.Scrace@xxxxxxxxxxxx
> > > > wrote:
> > > > 
> > > > > The other option we were considering was putting the journals on
> > > > > the OS SSDs, they are only 250GB and the rest would be for the
> > > > > OS. Is that a decent option?
> > > > >
> > > > You'll be getting a LOT better advice if you're telling us more
> > > > details.
> > > > 
> > > > For starters, have you bought the hardware yet?
> > > > Tell us about your design, how many initial storage nodes, how many
> > > > HDDs/SSDs per node, what CPUs/RAM/network?
> > > > 
> > > > What SSDs are we talking about, exact models please.
> > > > (Both the sizes you mentioned do not ring a bell for DC level SSDs
> > > > I'm aware of)
> > > > 
> > > > That said, I'm using Intel DC S3700s for mixed OS and journal use
> > > > with 
> > 
> > > > good
> > > > results. 
> > > > In your average Ceph storage node, normal OS (logging mostly)
> > > > activity is a
> > > > minute drop in the bucket for any decent SSD, so nearly all of it's
> > > > resources are available to journals.
> > > > 
> > > > You want to match the number of journals per SSD according to the
> > > > capabilities of your SSD, HDDs and network.
> > > > 
> > > > For example 8 HDD OSDs with 2 200GB DC S3700 and a 10Gb/s network
> > > > is a decent match. 
> > > > The two SSDs at 900MB/s would appear to be the bottleneck, but in
> > > > reality I'd expect the HDDs to be it.
> > > > Never mind that you'd be more likely to be IOPS than bandwidth 
> bound.
> > > > 
> > > > Regards,
> > > > 
> > > > Christian
> > > > 
> > > > > Thanks!
> > > > > 
> > > > > Cameron Scrace
> > > > > Infrastructure Engineer
> > > > > 
> > > > > Mobile +64 22 610 4629
> > > > > Phone  +64 4 462 5085 
> > > > > Email  cameron.scrace@xxxxxxxxxxxx
> > > > > Solnet Solutions Limited
> > > > > Level 12, Solnet House
> > > > > 70 The Terrace, Wellington 6011
> > > > > PO Box 397, Wellington 6140
> > > > > 
> > > > > www.solnet.co.nz
> > > > > 
> > > > > 
> > > > > 
> > > > > From:   Somnath Roy <Somnath.Roy@xxxxxxxxxxx>
> > > > > To:     "Cameron.Scrace@xxxxxxxxxxxx"
> > > > > <Cameron.Scrace@xxxxxxxxxxxx>, 
> > 
> > > > > "ceph-users@xxxxxxxx" <ceph-users@xxxxxxxx>
> > > > > Date:   08/06/2015 09:34 a.m.
> > > > > Subject:        RE:  Multiple journals and an OSD on
> > > > > one SSD 
> > > > 
> > > > > doable?
> > > > > 
> > > > > 
> > > > > 
> > > > > Cameron,
> > > > > Generally, it’s not a good idea. 
> > > > > You want to protect your SSDs used as journal.If any problem on
> > > > > that disk, you will be losing all of your dependent OSDs.
> > > > > I don’t think a bigger journal will gain you much performance ,
> > > > > so, default 5 GB journal size should be good enough. If you want 
> to
> > > > > reduce the fault domain and want to put 3 journals on a SSD , go
> > > > > for minimum size and high endurance SSDs for that.
> > > > > Now, if you want to use your rest of space of 1 TB ssd, creating 
> > just
> > > > > OSDs will not gain you much (rather may get some burst
> > > > > performance). You may want to consider the following.
> > > > > 
> > > > > 1. If your spindle OSD size is much bigger than 900 GB , you
> > > > > don’t want to make all OSDs of similar sizes, cache pool could
> > > > > be one of your option. But, remember, cache pool can wear out
> > > > > your SSDs faster as presently I guess it is not optimizing the
> > > > > extra writes. Sorry, I don’t have exact data as I am yet to test
> > > > > that out.
> > > > > 
> > > > > 2. If you want to make all the OSDs of similar sizes and you will
> > > > > be able to create a substantial number of OSDs with your unused
> > > > > SSDs (depends on how big the cluster is), you may want to put all
> > > > > of your primary OSDs to SSD and gain significant performance
> > > > > boost for read. Also, in this case, I don’t think you will be
> > > > > getting any burst performance. 
> > > > > Thanks & Regards
> > > > > Somnath
> > > > > 
> > > > > From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On 
> > Behalf
> > > > > Of 
> > > > 
> > > > > Cameron.Scrace@xxxxxxxxxxxx
> > > > > Sent: Sunday, June 07, 2015 1:49 PM
> > > > > To: ceph-users@xxxxxxxx
> > > > > Subject:  Multiple journals and an OSD on one SSD 
> > doable?
> > > > > 
> > > > > Setting up a Ceph cluster and we want the journals for our 
> spinning
> > > > > disks to be on SSDs but all of our SSDs are 1TB. We were planning
> > > > > on putting 3 journals on each SSD, but that leaves 900+GB unused
> > > > > on the drive, is it possible to use the leftover space as another
> > > > > OSD or will it affect performance too much? 
> > > > > 
> > > > > Thanks, 
> > > > > 
> > > > > Cameron Scrace
> > > > > Infrastructure Engineer
> > > > > 
> > > > > Mobile +64 22 610 4629
> > > > > Phone  +64 4 462 5085 
> > > > > Email  cameron.scrace@xxxxxxxxxxxx
> > > > > Solnet Solutions Limited
> > > > > Level 12, Solnet House
> > > > > 70 The Terrace, Wellington 6011
> > > > > PO Box 397, Wellington 6140
> > > > > 
> > > > > www.solnet.co.nzAttention: This email may contain information
> > > > > intended for the sole use of the original recipient. Please 
> respect
> > > > > this when sharing or disclosing this email's contents with any
> > > > > third party. If you believe you have received this email in
> > > > > error, please delete it and notify the sender or
> > > > > postmaster@xxxxxxxxxxxxxxxxxxxxx as soon as possible. The content
> > > > > of this email does not necessarily reflect the views of Solnet
> > > > > Solutions Ltd. 
> > > > > 
> > > > > 
> > > > > PLEASE NOTE: The information contained in this electronic mail
> > > > > message is intended only for the use of the designated 
> recipient(s)
> > > > > named above. If the reader of this message is not the intended
> > > > > recipient, you are hereby notified that you have received this
> > > > > message in error and that any review, dissemination,
> > > > > distribution, or copying of this message is strictly prohibited.
> > > > > If you have received this communication in error, please notify
> > > > > the sender by telephone or e-mail (as shown above) immediately
> > > > > and destroy any and all copies of this message in your
> > > > > possession (whether hard copies or electronically stored copies).
> > > > > 
> > > > > 
> > > > > 
> > > > > Attention:
> > > > > This email may contain information intended for the sole use of
> > > > > the original recipient. Please respect this when sharing or
> > > > > disclosing this email's contents with any third party. If you
> > > > > believe you have received this email in error, please delete it
> > > > > and notify the sender or postmaster@xxxxxxxxxxxxxxxxxxxxx as
> > > > > soon as possible. The content of this email does not necessarily
> > > > > reflect the views of Solnet Solutions Ltd.
> > > > > 
> > > > 
> > > > 
> > > 
> > > 
> > 
> > 
> 
> 


-- 
Christian Balzer        Network/Systems Engineer                
chibi@xxxxxxx   	Global OnLine Japan/Fusion Communications
http://www.gol.com/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com





[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux