ceph cluster expansion

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Christian,

Most of our backups are rsync or robocopy (windows), so they are
incremental file-based backups.
There will be a high level of parallelism as the backups run mostly
overnight with similar start times.
So far I've seen high iowait on the samba head we are using, but low osd
resource usage, suggesting that the bottleneck is within samba (ad lookups
most likely).
I'll be able to test it better when I've built some iscsi heads.

Looking at the hardware configuration guide, it suggests 1GHz per core, so
maybe we would be okay if we had two hex core procs (with trusty
hyperthreading) per storage server.

I think we are not completely averse to using different hardware, but
really not wanting to waste the 4 chassis we already have as none of this
kit is throwaway money.
Plus I don't like the idea of mixing different hardware across OSD nodes.

J

On 13 August 2014 14:06, Christian Balzer <chibi at gol.com> wrote:

> On Wed, 13 Aug 2014 12:47:22 +0100 James Eckersall wrote:
>
> > Hi Christian,
> >
> > We're actually using the following chassis:
> > http://rnt.de/en/bf_xxlarge.html
> >
> Ah yes, one of the Blazeback heritage.
> But rather more well designed and thought through than most of them.
>
> Using the motherboard SATA3 controller for the journal SSDs may be
> advantageous, something to try out with a new/spare machine.
>
> > So yes there are SAS expanders.  There are 4 expanders, one is used for
> > the SSD's and the other three are for the SATA drives.
> > The 4 SSD's for the OSD's are mounted at the back of the chassis, along
> > with the OS SSD's.
> > We're currently planning to buy a couple more servers with 3700's, but
> > now we're debating whether these chassis are actually right for us.
> > The density is pretty nice with 48x3.5" in 4U, but I think CPU spec falls
> > short.
> When going with classic Ceph, yes.
>
> > We can spec them up to dual cpu's, but I'm not sure even that would be
> > enough for 48 OSD's.
> When looking at:
>
> https://objects.dreamhost.com/inktankweb/Inktank_Hardware_Configuration_Guide.pdf
>
> it is possible.
> Especially if there are long, sequential writes.
>
> > So far I haven't been really taxing these storage servers - the space is
> > being presented via samba and I think there is a big bottleneck there, so
> > we're planning to move to iscsi instead.
> > We have over 200 servers backing up mostly web content (millions of small
> > files).
> >
> So you're doing more of a rsync, copy operation than using an
> actual backup software like bacula?
> Having 200 servers scribble individual files , potentially with high
> levels of parallelism is another story altogether compared to a few
> bacula streams.
>
> Christian
>
>
> > J
> >
> >
> > On 13 August 2014 10:28, Christian Balzer <chibi at gol.com> wrote:
> >
> > >
> > > Hello,
> > >
> > > On Wed, 13 Aug 2014 09:15:34 +0100 James Eckersall wrote:
> > >
> > > > Hi,
> > > >
> > > > I'm looking for some advice on my ceph cluster.
> > > >
> > > > The current setup is as follows:
> > > >
> > > > 3 mon servers
> > > >
> > > > 4 storage servers with the following spec:
> > > >
> > > > 1x Intel Xeon E5-2640 @2.50GHz 6 core (12 with hyperthreading).
> > > > 64GB DDR3 RAM
> > > > 2x SSDSC2BB080G4 for OS
> > > >
> > > > LSI MegaRAID 9260-16i with the following drives:
> > > 24 drives on a 16 port controller?
> > > I suppose your chassis backplanes are using port expanders then?
> > > How is this all connected up?
> > > It would be very beneficial if the journal SSDs had their own
> > > controller or at least full bandwidth paths.
> > >
> > > > 4 x SSDSC2CW240A3 SSD for OSD journals (5 OSD journals per SSD)
> > > People here will comment on the fact that Intel 520s are not power
> > > failure safe.
> > > I'll add to that that depending on the amount of data you're going to
> > > write to that cluster during its lifetime they might not be cheaper
> > > than DC D3700s either.
> > > You will definitely want to keep an eye on the SMART output of those,
> > > when the Media_Wearout_Indicator reaches 0 they will supposedly
> > > totally brick themselves, whereas the DC models will "just" go into
> > > R/O mode.
> > >
> > > > 20 x Seagate ST4000NM0023 (3.5" 4TB SATA)
> > > >
> > > >
> > > > The storage servers are 4U with 48 x 3.5" drive bays, which currently
> > > > only contain 20 drives.
> > > >
> > > Where are the journal SSDs then? The OS drives I can see being
> > > internal (or next to the PSU as with some Supermicro cases).
> > >
> > > > I'm looking for the best way to populate these chassis more.
> > > >
> > > > From what I've read about ceph requirements, I might not have the CPU
> > > > power to add another 24 OSD's to each chassis, so I've been
> > > > considering whether to RAID6 the OSD drives instead.
> > > >
> > > You would want to add just 20 OSDs and 4 more journal SSDs. ^o^
> > > And yes, depending on your workload you would be pushing the envelope
> > > with your current configuration at times already.
> > > For an example, with lots of small writes (4KB) IOs (fio or rados
> > > bench) I can push my latest storage node to nearly exhaust its CPU
> > > resources (and yes that's actual CPU cycles for the OSD processes, not
> > > waiting for IO).
> > >
> > > That node consists of:
> > > 1x Opteron 4386 (3.1GHz, 8 cores)
> > > 32GB RAM
> > > 4x Intel DC S3700 (100GB) on local SATA for journal and OS
> > > 8x TOSHIBA DT01ACA300 (3TB) for OSD filestore
> > >
> > > Of course if writing large blobs like with the default 4MB of rados
> > > bench or things like bonnie++ the load is considerably less.
> > >
> > > > Does anyone have any experience they can share with running OSD's on
> > > > RAID6?
> > > Look at recent threads like "Optimal OSD Configuration for 45 drives?"
> > > and "anti-cephalopod question" or scour older threads by me.
> > >
> > > > Or can anyone comment on whether the CPU I have will cope with~48
> > > > OSD's?
> > > >
> > > Even with "normal" load I'd be worried putting 40 OSDs on that poor
> > > CPU. When OSDs can't keep up with hearbeats from the MONs and other
> > > OSDs things go to hell in a hand basket very quickly.
> > >
> > > > This ceph cluster is being used for backups (windows and linux
> > > > servers), so I'm not looking for "out of this world" speed, but
> > > > obviously I don't want a snail either.
> > > >
> > > Well, read the above threads, but your use case looks very well suited
> > > for RAID6 backed OSDs.
> > >
> > > Something like 4 RAID6 with 10 HDDs and 4 global hot spares if I
> > > understand your chassis correctly. One journal SSD per OSD.
> > >
> > > You won't be doing more than 800 write IOPS per OSD, but backups means
> > > long sequential writes in my book and for those it will be just fine.
> > >
> > > Regards,
> > >
> > > Christian
> > > --
> > > Christian Balzer        Network/Systems Engineer
> > > chibi at gol.com           Global OnLine Japan/Fusion Communications
> > > http://www.gol.com/
> > >
>
>
> --
> Christian Balzer        Network/Systems Engineer
> chibi at gol.com           Global OnLine Japan/Fusion Communications
> http://www.gol.com/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140813/93f92153/attachment.htm>


[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux