Re: crush chooseleaf vs. choose

Sage Weil <sage@xxxxxxxxxxx> · Thu, 2 Jan 2014 09:24:24 -0800 (PST)

On Thu, 2 Jan 2014, Dietmar Maurer wrote:
> > > iirc, chooseleaf goes down the tree and descents into multiple leafs
> > > to find what you are looking for.
> > >
> > > choose goes into that leaf and tries to find what you are looking for
> > > without going into subtrees.
> > 
> > Right.  To a first approximation, these rules are equivalent.  The difference is
> > that the second won't handle the case where you have a host with many/all
> > OSDs tht are marked out (data will get shifted to the surviving OSD or you will get
> > no results), whereas the chooseleaf rule will handle things properly and maintain
> > a balanced distribution.
> 
> So if I have 3 hosts, each runnunig 2 OSDs, pool size=3
> What happens if I shut down one Host.
> 
> choose => no redistribution to remaining host
> chooseleaf => data is copied to remaining OSDs
> 
> Is that right?

In both cases, you only get 2 replicas on the remaining 2 hosts.

The difference is if you have 4 hosts with 2 osds.  In the choose case, 
you have some fraction of the data that chose the down host in the first 
step (most of the attempts, actually!) and then couldn't find a usable 
osd, leaving you with only 2 replicas.  With chooseleaf that doesn't 
happen.

The other difference is if you have one of the two OSDs on the host marked 
out.  In the choose case, the remaining OSD will get allocated 2x the 
data; in the chooseleaf case, usage will remain proportional with the rest 
of the cluster and the data from the out OSD will be distributed across 
other OSDs (at least when there are > 3 hosts!).

sage
> 
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com