ceph cluster expansion

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Christian,

We're actually using the following chassis:
http://rnt.de/en/bf_xxlarge.html

So yes there are SAS expanders.  There are 4 expanders, one is used for the
SSD's and the other three are for the SATA drives.
The 4 SSD's for the OSD's are mounted at the back of the chassis, along
with the OS SSD's.
We're currently planning to buy a couple more servers with 3700's, but now
we're debating whether these chassis are actually right for us.
The density is pretty nice with 48x3.5" in 4U, but I think CPU spec falls
short.
We can spec them up to dual cpu's, but I'm not sure even that would be
enough for 48 OSD's.
So far I haven't been really taxing these storage servers - the space is
being presented via samba and I think there is a big bottleneck there, so
we're planning to move to iscsi instead.
We have over 200 servers backing up mostly web content (millions of small
files).

J


On 13 August 2014 10:28, Christian Balzer <chibi at gol.com> wrote:

>
> Hello,
>
> On Wed, 13 Aug 2014 09:15:34 +0100 James Eckersall wrote:
>
> > Hi,
> >
> > I'm looking for some advice on my ceph cluster.
> >
> > The current setup is as follows:
> >
> > 3 mon servers
> >
> > 4 storage servers with the following spec:
> >
> > 1x Intel Xeon E5-2640 @2.50GHz 6 core (12 with hyperthreading).
> > 64GB DDR3 RAM
> > 2x SSDSC2BB080G4 for OS
> >
> > LSI MegaRAID 9260-16i with the following drives:
> 24 drives on a 16 port controller?
> I suppose your chassis backplanes are using port expanders then?
> How is this all connected up?
> It would be very beneficial if the journal SSDs had their own controller
> or at least full bandwidth paths.
>
> > 4 x SSDSC2CW240A3 SSD for OSD journals (5 OSD journals per SSD)
> People here will comment on the fact that Intel 520s are not power failure
> safe.
> I'll add to that that depending on the amount of data you're going to
> write to that cluster during its lifetime they might not be cheaper than
> DC D3700s either.
> You will definitely want to keep an eye on the SMART output of those, when
> the Media_Wearout_Indicator reaches 0 they will supposedly totally brick
> themselves, whereas the DC models will "just" go into R/O mode.
>
> > 20 x Seagate ST4000NM0023 (3.5" 4TB SATA)
> >
> >
> > The storage servers are 4U with 48 x 3.5" drive bays, which currently
> > only contain 20 drives.
> >
> Where are the journal SSDs then? The OS drives I can see being internal (or
> next to the PSU as with some Supermicro cases).
>
> > I'm looking for the best way to populate these chassis more.
> >
> > From what I've read about ceph requirements, I might not have the CPU
> > power to add another 24 OSD's to each chassis, so I've been considering
> > whether to RAID6 the OSD drives instead.
> >
> You would want to add just 20 OSDs and 4 more journal SSDs. ^o^
> And yes, depending on your workload you would be pushing the envelope with
> your current configuration at times already.
> For an example, with lots of small writes (4KB) IOs (fio or rados bench) I
> can push my latest storage node to nearly exhaust its CPU resources (and
> yes that's actual CPU cycles for the OSD processes, not waiting for IO).
>
> That node consists of:
> 1x Opteron 4386 (3.1GHz, 8 cores)
> 32GB RAM
> 4x Intel DC S3700 (100GB) on local SATA for journal and OS
> 8x TOSHIBA DT01ACA300 (3TB) for OSD filestore
>
> Of course if writing large blobs like with the default 4MB of rados bench
> or things like bonnie++ the load is considerably less.
>
> > Does anyone have any experience they can share with running OSD's on
> > RAID6?
> Look at recent threads like "Optimal OSD Configuration for 45 drives?" and
> "anti-cephalopod question" or scour older threads by me.
>
> > Or can anyone comment on whether the CPU I have will cope with~48 OSD's?
> >
> Even with "normal" load I'd be worried putting 40 OSDs on that poor CPU.
> When OSDs can't keep up with hearbeats from the MONs and other OSDs
> things go to hell in a hand basket very quickly.
>
> > This ceph cluster is being used for backups (windows and linux servers),
> > so I'm not looking for "out of this world" speed, but obviously I don't
> > want a snail either.
> >
> Well, read the above threads, but your use case looks very well suited for
> RAID6 backed OSDs.
>
> Something like 4 RAID6 with 10 HDDs and 4 global hot spares if I
> understand your chassis correctly. One journal SSD per OSD.
>
> You won't be doing more than 800 write IOPS per OSD, but backups means
> long sequential writes in my book and for those it will be just fine.
>
> Regards,
>
> Christian
> --
> Christian Balzer        Network/Systems Engineer
> chibi at gol.com           Global OnLine Japan/Fusion Communications
> http://www.gol.com/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140813/31cb9dfc/attachment.htm>


[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux