ceph cluster expansion

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,

On Wed, 13 Aug 2014 09:15:34 +0100 James Eckersall wrote:

> Hi,
> 
> I'm looking for some advice on my ceph cluster.
> 
> The current setup is as follows:
> 
> 3 mon servers
> 
> 4 storage servers with the following spec:
> 
> 1x Intel Xeon E5-2640 @2.50GHz 6 core (12 with hyperthreading).
> 64GB DDR3 RAM
> 2x SSDSC2BB080G4 for OS
> 
> LSI MegaRAID 9260-16i with the following drives:
24 drives on a 16 port controller?
I suppose your chassis backplanes are using port expanders then?
How is this all connected up? 
It would be very beneficial if the journal SSDs had their own controller
or at least full bandwidth paths.

> 4 x SSDSC2CW240A3 SSD for OSD journals (5 OSD journals per SSD)
People here will comment on the fact that Intel 520s are not power failure
safe.
I'll add to that that depending on the amount of data you're going to
write to that cluster during its lifetime they might not be cheaper than
DC D3700s either.
You will definitely want to keep an eye on the SMART output of those, when
the Media_Wearout_Indicator reaches 0 they will supposedly totally brick
themselves, whereas the DC models will "just" go into R/O mode.

> 20 x Seagate ST4000NM0023 (3.5" 4TB SATA)
> 
> 
> The storage servers are 4U with 48 x 3.5" drive bays, which currently
> only contain 20 drives.
> 
Where are the journal SSDs then? The OS drives I can see being internal (or
next to the PSU as with some Supermicro cases).

> I'm looking for the best way to populate these chassis more.
> 
> From what I've read about ceph requirements, I might not have the CPU
> power to add another 24 OSD's to each chassis, so I've been considering
> whether to RAID6 the OSD drives instead.
> 
You would want to add just 20 OSDs and 4 more journal SSDs. ^o^
And yes, depending on your workload you would be pushing the envelope with
your current configuration at times already.
For an example, with lots of small writes (4KB) IOs (fio or rados bench) I
can push my latest storage node to nearly exhaust its CPU resources (and
yes that's actual CPU cycles for the OSD processes, not waiting for IO). 

That node consists of:
1x Opteron 4386 (3.1GHz, 8 cores)
32GB RAM
4x Intel DC S3700 (100GB) on local SATA for journal and OS
8x TOSHIBA DT01ACA300 (3TB) for OSD filestore

Of course if writing large blobs like with the default 4MB of rados bench
or things like bonnie++ the load is considerably less.
 
> Does anyone have any experience they can share with running OSD's on
> RAID6? 
Look at recent threads like "Optimal OSD Configuration for 45 drives?" and
"anti-cephalopod question" or scour older threads by me.

> Or can anyone comment on whether the CPU I have will cope with~48 OSD's?
>
Even with "normal" load I'd be worried putting 40 OSDs on that poor CPU.
When OSDs can't keep up with hearbeats from the MONs and other OSDs
things go to hell in a hand basket very quickly.
 
> This ceph cluster is being used for backups (windows and linux servers),
> so I'm not looking for "out of this world" speed, but obviously I don't
> want a snail either.
> 
Well, read the above threads, but your use case looks very well suited for
RAID6 backed OSDs. 

Something like 4 RAID6 with 10 HDDs and 4 global hot spares if I
understand your chassis correctly. One journal SSD per OSD.

You won't be doing more than 800 write IOPS per OSD, but backups means
long sequential writes in my book and for those it will be just fine.

Regards,

Christian
-- 
Christian Balzer        Network/Systems Engineer                
chibi at gol.com   	Global OnLine Japan/Fusion Communications
http://www.gol.com/


[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux