On Tue, Sep 1, 2015 at 2:31 AM, Shesha Sreenivasamurthy <shesha@xxxxxxxx> wrote: > I had a question regarding how OSD locations are determined by CRUSH. > > From the CRUSH paper I gather that the replica locations of an object (A) is > a vector (v) that is got by the function c(r,x) = (hash (x) + rp) mod m). It is a hash function, but I don't think this is quite right. Objects are hashed (quickly, using rjenkins or something) into a placement group. The CRUSH function is then run on that placement group to assign it to a vector of OSDs; this is pretty configurable and takes a tree as input (with the choice of straw, list, etc types). > > Now when new OSDs are added, objects are shuffled to maintain uniform data > distribution. What in the above equation changes so that only minimal > movement is achieved. More specifically, if nothing in the above equation > changes then all the objects again map to the same locations. If p is > changed, then lots of object location can be changed. Therefore, how does > CRUSH guarantees only minimal data movement. Like I said, that's not the equation. It's more like you have three doors to choose from at each of three levels, and when you add a new door somewhere in the tree, you only move a little bit of the data around. > > Followup question is, if there in an ongoing IO to an object, the primary > replica is the one that will be getting updated. Does the re-shuffling in > that case do not consider currently hot objects for movement ? It definitely does not consider heat. Everything is based on the object names (locators, more specifically, but they're generally the same). Responsibility for maintaining the IO lives in layers above CRUSH. -Greg _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com