On 08/28/2015 05:37 PM, John Spray wrote: > On Fri, Aug 28, 2015 at 3:53 PM, Tony Nelson <tnelson@xxxxxxxxxxxxx> wrote: >> I recently built a 3 node Proxmox cluster for my office. I’d like to get HA >> setup, and the Proxmox book recommends Ceph. I’ve been reading the >> documentation and watching videos, and I think I have a grasp on the basics, >> but I don’t need anywhere near a petabyte of storage. >> >> >> >> I’m considering servers w/ 12 drive bays, 2 SDD mirrored for the OS, 2 SDDs >> for journals and the other 8 for OSDs. I was going to purchase 3 identical >> servers, and use my 3 Proxmox servers as the monitors, with of course GB >> networking in between. Obviously this is very vague, but I’m just getting >> started on the research. >> >> My concern is that I won’t have enough physical disks, and therefore I’ll >> end up with performance issues. > > That's impossible to know without knowing what kind of performance you need. > True, true. But I personally think that Ceph doesn't perform well on small <10 node clusters. >> I’ve seen many petabyte+ builds discussed, but not much on the smaller side. >> Does anyone have any guides or reference material I may have missed? > > The practicalities of fault tolerance are very different in a > minimum-size system (e.g. 3 servers configured for 3 replicas). > * when one disk fails, the default rules require that the only place > ceph can re-replicate the PGs that were on that disk is to other disks > on the same server where the failure occurred. One full disk's worth > of data will have to flow into the server where the failure occurred, > preferably quite fast (to avoid the risk of a double failure). > Recovering from a 2TB disk failure will take as long as it takes to > stream that much data over your 1gbps link. Your recovery time will > be similar to conventional RAID, unless you install a faster network. > * when one server fails, you're losing a full third of your > bandwidth. That means that your client workloads would have to be > sized to usually only use 2/3 of the theoretical bandwidth, or that > you would have to shut down some workloads when a server failed. In > larger systems this isn't such a worry as losing 1 of 32 servers is > only a 3% throughput loss. > Yes, the failure domain should be as small as possible. I prefer that loosing one machine is <10% of the cluster size. So with three nodes it's 33,3% failure domain. Wido > You should compare the price of getting the same amount of disk+ram, > but spread it between twice as many servers. The other option is of > course a traditional dual ported raid controller. > > John > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -- Wido den Hollander 42on B.V. Ceph trainer and consultant Phone: +31 (0)20 700 9902 Skype: contact42on _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com