Re: EC cluster design considerations

Paul Evans <paul@xxxxxxxxxxxx> · Fri, 3 Jul 2015 14:51:26 +0000

HI Adrien.  I can offer some feedback, and have a couple of questions myself: 

1)  if you’re going to deploy 9x4TB OSDs per host, with 7 hosts, and 4+2 EC, do you really want to put extras OSDs in ‘inner” drive bays if the target capacity is 100TB?   My rough calculations indicate 150TB usable capacity from your baseline
 of 9 OSDs per host, which makes the use of non hot-swap bays somewhat troublesome over time.  That said:  taking down a host is easy if the cluster isn’t loaded, and problematic if you want to maintain write throughput. 

2) In regards to the ‘fine tuning of how OSDs are marked out’:  many
production clusters these days are tuned to minimize the impact of recovery & backfill operations by limiting the number of operations allowed, or are simply left in a ’noout’ state to allow an administrator to make the decision about recovery.
  If you’re faced with backfilling 4-16TB worth of data while under full write load, it will take a while. Running in a ‘noout’ state might be best while your cluster remains small (<20 nodes)

Also:  how are you going to access the Ceph cluster for your backups? Perhaps via a block device?  If so, you’ll need a cache tier in front of the EC pool. 

Lastly, regarding Cluster Throughput:  EC seems to require a bit more CPU and memory than straight replication, which begs the question of how much RAM and CPU are you putting into the chassis?  With proper amounts, you should be able to hit your
 throughput targets,.  

-- 

Paul 

On Jul 3, 2015, at 6:47 AM, Adrien Gillard <gillard.adrien@xxxxxxxxx> wrote:

Hi everyone,

I am currently looking at Ceph to build a cluster to backup VMs. I am leveraging the solution against others like traditionnal SANs, etc. and to this point Ceph is economically more interesting and technically more challenging (not to bother me
 :) ).

OSD hosts should be based on Dell R730xd hardware, I plan to put 3 SSD and 9 OSD (4TB) per host.
I need approximately 100TB and, in order to save some space and still get the level of resiliency you can expect for backups, i am leaning towards EC (4+2) and 7 hosts.

I would like some input on the questions that still remain :

 - I can put more OSD directly inside the server (up to 4 additional disks) but that would require to power down the host to replace an "inner" OSD in case of failure. I was thinking I could add 3 internal disks to have 12 OSD per node instead
 of 9 for a higher density, at the cost of a more complex maintenance and higher risk for the cluster as there would be 4 OSD journals per SSD instead of 3. How manageable is bringing down a complete node for replacing a disk ? noout will surely come into play.
 How will the cluster behave when the host is back online to sync data ?

 - I also wonder about SSD failure,even if I intend to use Intel 3700 or at least 3610, in order not to be bothered with such issues :) So, in case of a SSD failure, the cluster should start backfilling / rebalancing the data of 3 to 4 OSDs. With
 proper monitoring and spare disks, one could replace the SSD within hours and avoid the impacts of backfilling lots of data, but this would require a fine tuning of how OSD are marked out. I know it is a bit bending the natural features and behaviours of Ceph
 but has anyone tested this approch ? With custom monitoring scripts or others ? Would you think it can be considered or the only way is to buy SSD that can sustain the load ? Also same question as above, how do Ceph handle down OSD that are set up after a
 while ?

 - My goal is to reach a bandwidth of several hundreds of MBytes of mostly sequential writes. Do you think a cluster of this type and size will be able to handle it ? The only benchmarks I could find on the mailing list are Loic's on EC plugins
 and Roy's on a full SSD EC backend.

Lots of thanks in advance,

Adrien

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com