Re: Planning a home ceph cluster

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 02/11/2014 08:03 PM, Ethan Levine wrote:
Hey all,

Hi!


I've been planning building myself a server cluster as a sort of hobby
project, and I've decided to use Ceph for its storage system. I have a
few questions, though.

My plan is to build 3 relatively dense servers (20 drive bays each) and
fill each one with relatively consumer equipment (AMD 8-core FX
processor, 24+ GB ECC RAM, and a decent SAS card that can provide a
channel to each drive).  For drives, I was planning on using 3 TB or 4
TB WD Red drives (fairly cheap but should be reliable).  I'm only
budgeting ~$7500 for it, so I'll only populate 5 drives per node from
the get-go, but I can just fill them up as my storage requirements grow.

I'd suggest considering smaller servers with fewer drives to spread things out a bit more if you can and grow servers rather than growing more disks in those 3 servers. Not sure if you are going for rackmount or not. There's plenty of rackmount case options from supermicro/etc, but if you are going for a more consumer oriented case these look interesting:

http://www.u-nas.com/xcart/product.php?productid=17617&cat=0&featured=Y



There's a catch though: I also want to run some VMs on this cluster
(KVM/libvirt managed by Pacemaker, with RBD as block devices of
course).  I don't plan on running anything particularly heavy (a voice
server here, a web server there, maybe a game server or two), and the
workload on the cluster won't be heavy (maybe 3-5 users max, likely idle
most of the time with bursts up to 1 Gbps reads if the cluster can
provide it).

You can do it, but we tend not to like it for production clusters. For a fairly idle home cluster, it's probably fine.


I have 4 questions:

   * The docs mention aiming for 1 GB RAM per 1 TB storage.  However,
consumer equipment seems to max out around 32 GB - I couldn't find any
reputable consumer motherboards that supported more.  If the nodes are
fairly populated at ~50 TB each, and VMs are using ~4 GB RAM on each
node, that leaves me with just over 500 MB RAM per 1 TB storage.  For
smaller loads, will this suffice?  Are the nodes going to be choked when
a disk fails and Ceph migrates data?  Even if I migrate all the VMs to
separate nodes by the time I max out the Ceph nodes, that's still only
32 GB RAM for 60-80 TB storage.

My personal goal it shoot for about 1.5-2GB per OSD. 32GB for 20 OSDs will probably work ok. This is another reason though that going with more OSD nodes that have fewer OSDs each might be preferable.


   * I'm planning on having either 3x or 5x 1 Gbps ethernet port on each
node, with a decent managed switch.  I should be able to aggregate these
lines however I wish - say, either use just a single 5 Gbps connection
to the switch, or split it into a 2 Gbps front-end connection and 3 Gbps
back-end connection.  I would value any input on which configuration
would likely be best.  Both fiber and 10 Gbps copper are outside of my
price range.

You may find that link aggregation is less efficient once you go beyond 2 links. How you set this up really depends on your goals and what kind of read/write workload you end up with. If you are write heavy and have 2x replication or more, having a separate backend network can be nice.


   * How stable is CephFS?  When I started planning this (months ago),
CephFS sounded pretty unstable, but I still wanted to be able to provide
a filesystem to clients.  I planned on doing this by allocating a very
large RBD image to a VM, having that VM format it as ext4 or xfs, and
then run Samba on the VM to "export" the filesystem.  It seems like
CephFS has matured since then, though, to the point where running an MDS
on each node (with only a single primary/master MDS) *should* run
smoothly, and significantly faster than the "wrap ext4 and Samba around
RBD" solution.  Again, this is a home cluster, so I won't lose my job if
the system dies - it's definitely not mission-critical, but I still
don't want to restore from backups every month.  [As a small side note:
Can a single MDS daemon manage multiple, independent filesystems?  I
couldn't find anything in the docs about it.]

We aren't supporting cephfs in production yet. Right now CephFS is most stable in a single active MDS configuration. Whether or not it's stable enough for home use is kind of in the eye of the beholder, but we do have people using it now, so you'll have to decide for yourself. Never hurts to have backups. :)


   * I'm planning on buying a single SSD for each node for the OS and
journals.  As I populated the nodes, I was going to buy a second SSD,
and split each SSD into two partitions - so I can have a RAID 1
partition for the OS and a larger RAID 0 partition for the journals.  Is
this unwise?  Will two SSDs be able to provide enough throughput and
IOPS for 20 journals, or do I need to plan for more?

I would strongly suggest you not put 20 journals on a single SSD. If the SSD dies you lose all of the OSDs in your node all at once. If you are doing a lot of writes, you may be really hammering those SSDs too. As far as performance goes, it'll limit your sequential write performance, but probably not as much as your network with 20 drives per node. For small random writes, it probably depends on the SSD used. I'd suggest probably looking at the Crucial M500.


I'm also grateful for any other comments or suggestions you can offer. I
probably won't order the parts for another 1-2 weeks, so there's plenty
of time for me to switch things around a bit based on advice from this ML.

Other than spreading your OSDs out over more nodes, not too much. It sounds like a fun project!


Thanks for your time,

- Ethan
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux