Re: Multiple journals and an OSD on one SSD doable?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Cameron,

To offer at least some constructive advice here instead of just all doom
and gloom, here's what I'd do:

Replace the OS SSDs with 2 400GB Intel DC S3700s (or S3710s).
They have enough BW to nearly saturate your network.

Put all your journals on them (3 SSD OSD and 3 HDD OSD per). 
While that's a bad move from a failure domain perspective, your budget
probably won't allow for anything better and those are VERY reliable and
just as important durable SSDs. 

This will give you the speed your current setup is capable of, probably
limited by the CPU when it comes to SSD pool operations.

Christian

On Mon, 8 Jun 2015 10:44:06 +0900 Christian Balzer wrote:

> 
> Hello Cameron,
> 
> On Mon, 8 Jun 2015 13:13:33 +1200 Cameron.Scrace@xxxxxxxxxxxx wrote:
> 
> > Hi Christian,
> > 
> > Yes we have purchased all our hardware, was very hard to convince 
> > management/finance to approve it, so some of the stuff we have is a
> > bit cheap.
> > 
> Unfortunate. Both the done deal and the cheapness. 
> 
> > We have four storage nodes each with 6 x 6TB Western Digital Red SATA 
> > Drives (WD60EFRX-68M) and 6 x 1TB Samsung EVO 850s SSDs and 2x250GB 
> > Samsung EVO 850s (for OS raid).
> > CPUs are Intel Atom C2750  @ 2.40GHz (8 Cores) with 32 GB of RAM. 
> > We have a 10Gig Network.
> >
> I wish there was a nice way to say this, but it unfortunately boils down
> to a "You're fooked".
> 
> There have been many discussions about which SSDs are usable with Ceph,
> very recently as well.
> Samsung EVOs (the non DC type for sure) are basically unusable for
> journals. See the recent thread:
>  Possible improvements for a slow write speed (excluding independent SSD
> journals) and:
> http://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/
> for reference.
> 
> I presume your intention for the 1TB SSDs is for a SSD backed pool? 
> Note that the EVOs have a pretty low (guaranteed) endurance, so aside
> from needing journal SSDs that actually can do the job, you're looking at
> wearing them out rather quickly (depending on your use case of course).
> 
> Now with SSD based OSDs or even HDD based OSDs with SSD journals that CPU
> looks a bit anemic.
> 
> More below:
> > The two options we are considering are:
> > 
> > 1) Use two of the 1TB SSDs for the spinning disk journals (3 each) and 
> > then use the remaining 900+GB of each drive as an OSD to be part of
> > the cache pool.
> > 
> > 2) Put the spinning disk journals on the OS SSDs and use the 2 1TB
> > SSDs for the cache pool.
> > 
> Cache pools aren't all that speedy currently (research the ML archives),
> even less so with the SSDs you have.
> 
> Christian
> 
> > In both cases the other 4 1TB SSDs will be part of their own tier.
> > 
> > Thanks a lot!
> > 
> > Cameron Scrace
> > Infrastructure Engineer
> > 
> > Mobile +64 22 610 4629
> > Phone  +64 4 462 5085 
> > Email  cameron.scrace@xxxxxxxxxxxx
> > Solnet Solutions Limited
> > Level 12, Solnet House
> > 70 The Terrace, Wellington 6011
> > PO Box 397, Wellington 6140
> > 
> > www.solnet.co.nz
> > 
> > 
> > 
> > From:   Christian Balzer <chibi@xxxxxxx>
> > To:     "ceph-users@xxxxxxxx" <ceph-users@xxxxxxxx>
> > Cc:     Cameron.Scrace@xxxxxxxxxxxx
> > Date:   08/06/2015 12:18 p.m.
> > Subject:        Re:  Multiple journals and an OSD on one
> > SSD doable?
> > 
> > 
> > 
> > 
> > Hello,
> > 
> > 
> > On Mon, 8 Jun 2015 09:55:56 +1200 Cameron.Scrace@xxxxxxxxxxxx wrote:
> > 
> > > The other option we were considering was putting the journals on the
> > > OS SSDs, they are only 250GB and the rest would be for the OS. Is
> > > that a decent option?
> > >
> > You'll be getting a LOT better advice if you're telling us more
> > details.
> > 
> > For starters, have you bought the hardware yet?
> > Tell us about your design, how many initial storage nodes, how many
> > HDDs/SSDs per node, what CPUs/RAM/network?
> > 
> > What SSDs are we talking about, exact models please.
> > (Both the sizes you mentioned do not ring a bell for DC level SSDs I'm
> > aware of)
> > 
> > That said, I'm using Intel DC S3700s for mixed OS and journal use with 
> > good
> > results. 
> > In your average Ceph storage node, normal OS (logging mostly) activity
> > is a
> > minute drop in the bucket for any decent SSD, so nearly all of it's
> > resources are available to journals.
> > 
> > You want to match the number of journals per SSD according to the
> > capabilities of your SSD, HDDs and network.
> > 
> > For example 8 HDD OSDs with 2 200GB DC S3700 and a 10Gb/s network is a
> > decent match. 
> > The two SSDs at 900MB/s would appear to be the bottleneck, but in
> > reality I'd expect the HDDs to be it.
> > Never mind that you'd be more likely to be IOPS than bandwidth bound.
> >  
> > Regards,
> > 
> > Christian
> > 
> > > Thanks!
> > > 
> > > Cameron Scrace
> > > Infrastructure Engineer
> > > 
> > > Mobile +64 22 610 4629
> > > Phone  +64 4 462 5085 
> > > Email  cameron.scrace@xxxxxxxxxxxx
> > > Solnet Solutions Limited
> > > Level 12, Solnet House
> > > 70 The Terrace, Wellington 6011
> > > PO Box 397, Wellington 6140
> > > 
> > > www.solnet.co.nz
> > > 
> > > 
> > > 
> > > From:   Somnath Roy <Somnath.Roy@xxxxxxxxxxx>
> > > To:     "Cameron.Scrace@xxxxxxxxxxxx" <Cameron.Scrace@xxxxxxxxxxxx>, 
> > > "ceph-users@xxxxxxxx" <ceph-users@xxxxxxxx>
> > > Date:   08/06/2015 09:34 a.m.
> > > Subject:        RE:  Multiple journals and an OSD on one
> > > SSD 
> > 
> > > doable?
> > > 
> > > 
> > > 
> > > Cameron,
> > > Generally, it’s not a good idea. 
> > > You want to protect your SSDs used as journal.If any problem on that
> > > disk, you will be losing all of your dependent OSDs.
> > > I don’t think a bigger journal will gain you much performance , so, 
> > > default 5 GB journal size should be good enough. If you want to
> > > reduce the fault domain and want to put 3 journals on a SSD , go for
> > > minimum size and high endurance SSDs for that.
> > > Now, if you want to use your rest of space of 1 TB ssd, creating just
> > > OSDs will not gain you much (rather may get some burst performance).
> > > You may want to consider the following.
> > > 
> > > 1. If your spindle OSD size is much bigger than 900 GB , you don’t
> > > want to make all OSDs of similar sizes, cache pool could be one of
> > > your option. But, remember, cache pool can wear out your SSDs faster
> > > as presently I guess it is not optimizing the extra writes. Sorry, I
> > > don’t have exact data as I am yet to test that out.
> > > 
> > > 2. If you want to make all the OSDs of similar sizes and you will be
> > > able to create a substantial number of OSDs with your unused SSDs
> > > (depends on how big the cluster is), you may want to put all of your
> > > primary OSDs to SSD and gain significant performance boost for read.
> > > Also, in this case, I don’t think you will be getting any burst
> > > performance. 
> > > Thanks & Regards
> > > Somnath
> > > 
> > > From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf
> > > Of 
> > 
> > > Cameron.Scrace@xxxxxxxxxxxx
> > > Sent: Sunday, June 07, 2015 1:49 PM
> > > To: ceph-users@xxxxxxxx
> > > Subject:  Multiple journals and an OSD on one SSD doable?
> > > 
> > > Setting up a Ceph cluster and we want the journals for our spinning
> > > disks to be on SSDs but all of our SSDs are 1TB. We were planning on
> > > putting 3 journals on each SSD, but that leaves 900+GB unused on the
> > > drive, is it possible to use the leftover space as another OSD or
> > > will it affect performance too much? 
> > > 
> > > Thanks, 
> > > 
> > > Cameron Scrace
> > > Infrastructure Engineer
> > > 
> > > Mobile +64 22 610 4629
> > > Phone  +64 4 462 5085 
> > > Email  cameron.scrace@xxxxxxxxxxxx
> > > Solnet Solutions Limited
> > > Level 12, Solnet House
> > > 70 The Terrace, Wellington 6011
> > > PO Box 397, Wellington 6140
> > > 
> > > www.solnet.co.nzAttention: This email may contain information
> > > intended for the sole use of the original recipient. Please respect
> > > this when sharing or disclosing this email's contents with any third
> > > party. If you believe you have received this email in error, please
> > > delete it and notify the sender or postmaster@xxxxxxxxxxxxxxxxxxxxx
> > > as soon as possible. The content of this email does not necessarily
> > > reflect the views of Solnet Solutions Ltd. 
> > > 
> > > 
> > > PLEASE NOTE: The information contained in this electronic mail
> > > message is intended only for the use of the designated recipient(s)
> > > named above. If the reader of this message is not the intended
> > > recipient, you are hereby notified that you have received this
> > > message in error and that any review, dissemination, distribution,
> > > or copying of this message is strictly prohibited. If you have
> > > received this communication in error, please notify the sender by
> > > telephone or e-mail (as shown above) immediately and destroy any and
> > > all copies of this message in your possession (whether hard copies
> > > or electronically stored copies).
> > > 
> > > 
> > > 
> > > Attention:
> > > This email may contain information intended for the sole use of
> > > the original recipient. Please respect this when sharing or
> > > disclosing this email's contents with any third party. If you
> > > believe you have received this email in error, please delete it
> > > and notify the sender or postmaster@xxxxxxxxxxxxxxxxxxxxx as
> > > soon as possible. The content of this email does not necessarily
> > > reflect the views of Solnet Solutions Ltd.
> > > 
> > 
> > 
> 
> 


-- 
Christian Balzer        Network/Systems Engineer                
chibi@xxxxxxx   	Global OnLine Japan/Fusion Communications
http://www.gol.com/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com





[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux