Re: Erasure Encoding Chunks

Loic Dachary <loic@xxxxxxxxxxx> · Fri, 05 Dec 2014 18:27:35 +0100

On 05/12/2014 17:41, Nick Fisk wrote:
> Hi Loic,
> 
> Thanks for your response.
> 
> The idea for this cluster will be for our VM Replica storage in our
> secondary site. Initially we are planning to have a 40 disk EC pool sitting
> behind a cache pool of around 1TB post replica size.
> 
> This storage will be presented as RBD's and then exported as a HA iSCSI
> target to ESX hosts. The VM's will be replicated from our primary site via a
> software product called Veeam.
> 
> I'm hoping that the 1TB cache layer should be big enough to hold most of the
> hot data meaning that the EC pool shouldn't see a large amount of IO, just
> the trickle of the cache layer flushing back to disk. We can switch back to
> a 3 way replica pool if the EC pool doesn't work out for us, but we are
> interested in testing out the EC technology.
> 
> I hope that provides an insight to what I am trying to achieve.

When the erasure coded object has to be promoted back to the replicated pool, you want that to happen as fast as possible. The read will return when all 6 OSDs give their data chunk to the primary OSD (holding the 7th chunk). The 6 read happen in parallel and will complete when the slower OSD returns. If you have 16 OSDs instead of 6 you increase the odds of slowing the whole read down because one of them is significantly slower than the others. If you have 40 OSDs you probably don't need a sophisticated monitoring system detecting hard drive misbehavior and a slow disk could go unnoticed and degrade your performances significantly because more than a third of the objects use it (each object is using 20 OSDs total, 17 of which are for data you need to promote to the replicated pool). If you had over 1000 OSDs, you would probably need to monitor the hard drives accurately and detect slow OSDs sooner and move them out of the cluster. And only a fraction of the objects would be impacted 
by a slow OSD. 

I would love to hear what an architect would advise.

Cheers

> 
> Thanks,
> Nick
> 
> -----Original Message-----
> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of
> Loic Dachary
> Sent: 05 December 2014 16:23
> To: Nick Fisk; 'Ceph Users'
> Subject: Re:  Erasure Encoding Chunks
> 
> 
> 
> On 05/12/2014 16:21, Nick Fisk wrote:> Hi All,
>>
>>  
>>
>> Does anybody have any input on what the best ratio + total numbers of Data
> + Coding chunks you would choose?
>>
>>  
>>
>> For example I could create a pool with 7 data chunks and 3 coding chunks
> and get an efficiency of 70%, or I could create a pool with 17 data chunks
> and 3 coding chunks and get an efficiency of 85% with a similar probability
> of protecting against OSD failure.
>>
>>  
>>
>> What’s the reason I would choose 10 total chunks over 20 chunks, is it
> purely down to the overhead of having potentially double the number of
> chunks per object?
> 
> Hi Nick,
> 
> Assuming you have a large number of OSD (a thousand or more) with cold data,
> 20 is probably better. When you try to read the data it involves 20 OSDs
> instead of 10 but you probably don't care if reads are rare. 
> 
> Disclaimer : I'm a developer not an architect ;-) It would help to know the
> target use case, the size of the data set and the expected read/write rate.
> 
> Cheers
> 

-- 
Loïc Dachary, Artisan Logiciel Libre

Attachment:
signature.asc

Description: OpenPGP digital signature
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com