Re: Questions about CRUSH

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Wido!

On Thu, 7 Oct 2010, Wido den Hollander wrote:
> Hi,
> 
> I'm working on a crushmap where I have my hosts spread out over 3 racks
> (leafs).
> 
> I have 9 physical machines, each with one OSD, spread out over three
> racks.
> 
> The replication level I intend to use is 3, my goal with this crushmap
> is to prevent two replicas being stored in the same rack.
> 
> Now, this map seems fine to me, but what if one of the racks fails and
> the cluster starts to fix itself, then I would get two replicas in the
> same rack, wouldn't I?

Right.

> Is it better to have: leafs at root = (max replication level + 1) ?
> 
> So, if I have my replication level set to 3, I should have 4 racks with
> each 3 OSD's, then the cluster could restore from a complete rack
> failure, without compromising my data safety.
> 
> When a complete leaf (rack) fails, the other leafs should be able to
> store all the data, so if my replication level is set to 3, I should
> always have at least 1/3 of free space, otherwise a full recovery won't
> be possible, correct? (OSD's run out of disk space).
> 
> Am I missing something here or is this the right approach?

Yeah, I think this is the right approach. 

> And I'm not completely sure about:
> 
> rule placein3racks {
rule placeinNracks {
>         ruleset 0
>         type replicated
>         min_size 2
>         max_size 2
	min_size 2
	max_size 10
>         step take root
>         step chooseleaf firstn 0 type rack
>         step emit
> }
>
> Is that correct? Here I say that the first step should be to choose a
> rack where the replica should be saved. Should I also specify to choose
> a host afterwards?

The rule generalizes to N replicas, where N can be 2..10 (that's what the 
min/max size fields are for).  And the chooseleaf line is correct.  That 
chooses N leaves/devices that are nested beneath N distinct racks.  Which 
is what you want!

You could also do

	step take root
	step choose firstn 0 type rack
	step choose firstn 1 type device
	step emit

That would choose N racks, and then for each rack, choose a nested device.  
The problem is when one of the racks it chooses has no (or few) online 
devices beneath it, we fail to find a usable device, and the result set 
will have <N devices.  Chooseleaf doesn't have that problem.

sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux