Re: SSD Journal

Christian Balzer <chibi@xxxxxxx> · Fri, 15 Jul 2016 00:06:40 +0900

Hello,

On Thu, 14 Jul 2016 13:37:54 +0200 Steffen Weißgerber wrote:

> 
> 
> >>> Christian Balzer <chibi@xxxxxxx> schrieb am Donnerstag, 14. Juli 2016 um
> 05:05:
> 
> Hello,
> 
> > Hello,
> > 
> > On Wed, 13 Jul 2016 09:34:35 +0000 Ashley Merrick wrote:
> > 
> >> Hello,
> >> 
> >> Looking at using 2 x 960GB SSD's (SM863)
> >>
> > Massive overkill.
> >  
> >> Reason for larger is I was thinking would be better off with them in Raid 1 
> > so enough space for OS and all Journals.
> >>
> > As I pointed out several times in this ML, Ceph journal usage rarely
> > exceeds hundreds of MB, let alone several GB with default parameters.
> > So 10GB per journal is plenty, unless you're doing something very special
> > (and you aren't with normal HDDs as OSDs).
> >  
> >> Instead am I better off using 2 x 200GB S3700's instead, with 5 disks per a 
> > SSD?
> >>
> > S3700s are unfortunately EOL'ed, the 200GB ones were great at 375MB/s.
> > 200GB S3710s are about on par for 5 HDDs at 300MB/s, but if you can afford
> > it and have a 10Gb/s network, the 400GB ones at 470MB/s would be optimal.
> > 
> > As for sharing the SSDs with OS, I do that all the time, the minute
> > logging of a storage node really has next to no impact.
> > 
> > I prefer this over using DoMs for reasons of:
> > 1. Redundancy
> > 2. hot-swapability  
> > 
> > If you go the DoM route, make sure it's size AND endurance are a match for
> > what you need. 
> > This is especially important if you were to run a MON on those machines as
> > well.
> > 
> 
> Cause we had to change some DoM's due to heavy MON logging, how do you
> configure MON logging? On that redundant SSD's or remote?
>  

What model/maker DoM where those?

Anyway, everything that runs a MON in my clusters has SSDs with sufficient
endurance for the OS.
Heck, even 180GB Intel 530s (aka consumer SSD) doing the OS for a
dedicated MON in a busy (but standard level logging) cluster are only at
98% wear-out (as in 2% down) after a year.
Though that's a HW RAID1 and the controller has 512MB cache, so writes do
get nicely coalesced. 
All my other MONs (shared with on OSD storage nodes) are on S37x0 SSDs.

OTOH, the Supermicro DoMs look nice enough on paper with 1 DWPD:
https://www.supermicro.com/products/nfo/SATADOM.cfm

The 64GB model should do the trick in most scenarios.

Christian

> Steffen
> 
> > Christian
> > 
> >> Thanks,
> >> Ashley
> >> 
> >> -----Original Message-----
> >> From: Christian Balzer [mailto:chibi@xxxxxxx] 
> >> Sent: 13 July 2016 01:12
> >> To: ceph-users@xxxxxxxxxxxxxx 
> >> Cc: Wido den Hollander <wido@xxxxxxxx>; Ashley Merrick <ashley@xxxxxxxxxxxxxx>
> >> Subject: Re:  SSD Journal
> >> 
> >> 
> >> Hello,
> >> 
> >> On Tue, 12 Jul 2016 19:14:14 +0200 (CEST) Wido den Hollander wrote:
> >> 
> >> > 
> >> > > Op 12 juli 2016 om 15:31 schreef Ashley Merrick <ashley@xxxxxxxxxxxxxx>:
> >> > > 
> >> > > 
> >> > > Hello,
> >> > > 
> >> > > Looking at final stages of planning / setup for a CEPH Cluster.
> >> > > 
> >> > > Per a Storage node looking @
> >> > > 
> >> > > 2 x SSD OS / Journal
> >> > > 10 x SATA Disk
> >> > > 
> >> > > Will have a small Raid 1 Partition for the OS, however not sure if best to 
> > do:
> >> > > 
> >> > > 5 x Journal Per a SSD
> >> > 
> >> > Best solution. Will give you the most performance for the OSDs. RAID-1 will 
> > just burn through cycles on the SSDs.
> >> > 
> >> > SSDs don't fail that often.
> >> >
> >> What Wido wrote, but let us know what SSDs you're planning to use.
> >> 
> >> Because the detailed version of that sentence should read: 
> >> "Well known and tested DC level SSDs whose size/endurance levels are matched 
> > to the workload rarely fail, especially unexpected."
> >>  
> >> > Wido
> >> > 
> >> > > 10 x Journal on Raid 1 of two SSD's
> >> > > 
> >> > > Is the "Performance" increase from splitting 5 Journal's on each SSD worth 
> > the "issue" caused when one SSD goes down?
> >> > > 
> >> As always, assume at least a node being the failure domain you need to be 
> > able to handle.
> >> 
> >> Christian
> >> 
> >> > > Thanks,
> >> > > Ashley
> >> > > _______________________________________________
> >> > > ceph-users mailing list
> >> > > ceph-users@xxxxxxxxxxxxxx 
> >> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
> >> > _______________________________________________
> >> > ceph-users mailing list
> >> > ceph-users@xxxxxxxxxxxxxx 
> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
> >> > 
> >> 
> >> 
> > 
> > 
> > -- 
> > Christian Balzer        Network/Systems Engineer                
> > chibi@xxxxxxx   	Global OnLine Japan/Rakuten Communications
> > http://www.gol.com/ 
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@xxxxxxxxxxxxxx 
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 

-- 
Christian Balzer        Network/Systems Engineer                
chibi@xxxxxxx   	Global OnLine Japan/Rakuten Communications
http://www.gol.com/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com