Re: How objects are reshuffled on addition of new OSD

Gregory Farnum <gfarnum@xxxxxxxxxx> · Tue, 8 Sep 2015 14:00:49 +0100

On Tue, Sep 1, 2015 at 2:31 AM, Shesha Sreenivasamurthy <shesha@xxxxxxxx> wrote:
> I had a question regarding how OSD locations are determined by CRUSH.
>
> From the CRUSH paper I gather that the replica locations of an object (A) is
> a vector (v) that is got by the function c(r,x) = (hash (x) + rp) mod m).

It is a hash function, but I don't think this is quite right. Objects
are hashed (quickly, using rjenkins or something) into a placement
group. The CRUSH function is then run on that placement group to
assign it to a vector of OSDs; this is pretty configurable and takes a
tree as input (with the choice of straw, list, etc types).

>
> Now when new OSDs are added, objects are shuffled to maintain uniform data
> distribution. What in the above equation changes so that only minimal
> movement is achieved. More specifically, if nothing in the above equation
> changes then all the objects again map to the same locations. If p is
> changed, then lots of object location can be changed. Therefore, how does
> CRUSH guarantees only minimal data movement.

Like I said, that's not the equation. It's more like you have three
doors to choose from at each of three levels, and when you add a new
door somewhere in the tree, you only move a little bit of the data
around.

>
> Followup question is, if there in an ongoing IO to an object, the primary
> replica is the one that will be getting updated. Does the re-shuffling in
> that case do not consider currently hot objects for movement ?

It definitely does not consider heat. Everything is based on the
object names (locators, more specifically, but they're generally the
same). Responsibility for maintaining the IO lives in layers above
CRUSH.
-Greg
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com