Re: Is Ceph appropriate for small installations?

Wido den Hollander <wido@xxxxxxxx> · Sat, 29 Aug 2015 10:02:19 +0200

On 08/28/2015 05:37 PM, John Spray wrote:
> On Fri, Aug 28, 2015 at 3:53 PM, Tony Nelson <tnelson@xxxxxxxxxxxxx> wrote:
>> I recently built a 3 node Proxmox cluster for my office.  I’d like to get HA
>> setup, and the Proxmox book recommends Ceph.  I’ve been reading the
>> documentation and watching videos, and I think I have a grasp on the basics,
>> but I don’t need anywhere near a petabyte of storage.
>>
>>
>>
>> I’m considering servers w/ 12 drive bays, 2 SDD mirrored for the OS, 2 SDDs
>> for journals and the other 8 for OSDs.  I was going to purchase 3 identical
>> servers, and use my 3 Proxmox servers as the monitors, with of course GB
>> networking in between.  Obviously this is very vague, but I’m just getting
>> started on the research.
>>
>> My concern is that I won’t have enough physical disks, and therefore I’ll
>> end up with performance issues.
> 
> That's impossible to know without knowing what kind of performance you need.
> 

True, true. But I personally think that Ceph doesn't perform well on
small <10 node clusters.

>> I’ve seen many petabyte+ builds discussed, but not much on the smaller side.
>> Does anyone have any guides or reference material I may have missed?
> 
> The practicalities of fault tolerance are very different in a
> minimum-size system (e.g. 3 servers configured for 3 replicas).
>  * when one disk fails, the default rules require that the only place
> ceph can re-replicate the PGs that were on that disk is to other disks
> on the same server where the failure occurred.  One full disk's worth
> of data will have to flow into the server where the failure occurred,
> preferably quite fast (to avoid the risk of a double failure).
> Recovering from a 2TB disk failure will take as long as it takes to
> stream that much data over your 1gbps link.  Your recovery time will
> be similar to conventional RAID, unless you install a faster network.
>  * when one server fails, you're losing a full third of your
> bandwidth.  That means that your client workloads would have to be
> sized to usually only use 2/3 of the theoretical bandwidth, or that
> you would have to shut down some workloads when a server failed.  In
> larger systems this isn't such a worry as losing 1 of 32 servers is
> only a 3% throughput loss.
> 

Yes, the failure domain should be as small as possible. I prefer that
loosing one machine is <10% of the cluster size.

So with three nodes it's 33,3% failure domain.

Wido

> You should compare the price of getting the same amount of disk+ram,
> but spread it between twice as many servers.  The other option is of
> course a traditional dual ported raid controller.
> 
> John
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

-- 
Wido den Hollander
42on B.V.
Ceph trainer and consultant

Phone: +31 (0)20 700 9902
Skype: contact42on
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com