On Thu, Nov 7, 2013 at 6:43 AM, Kenneth Waegeman <Kenneth.Waegeman@xxxxxxxx> wrote: > Hi everyone, > > I just started to look at the documentation of Ceph and I've hit something I > don't understand. > It's about something on http://ceph.com/docs/master/architecture/ > > """ > use the following steps to compute PG IDs. > > The client inputs the pool ID and the object ID. (e.g., pool = ?liverpool? > and object-id = ?john?) > CRUSH takes the object ID and hashes it. > --> CRUSH calculates the hash modulo the number of OSDs. (e.g., 0x58) to get > a PG ID. <--- > CRUSH gets the pool ID given the pool name (e.g., ?liverpool? = 4) > CRUSH prepends the pool ID to the PG ID (e.g., 4.0x58). > """ > > Shouldn't this be 'CRUSH calculates the hash modulo the the number of PGs to > get a PG ID' ? Yes! > But then what happens if you add more PGs to the pool? Then most of the data > will be reallocated to another PG? Indeed. What actually happens is that as you increase the number of PGs, the new ones are formed by splitting apart existing PGs — we basically do this by picking the PG for an object based on the N least significant bits of the hash value, and then if the number of PGs doubles we use the N+1 least significant bits, so each PG is split exactly in half. (It's a little more complicated than that to deal with non-power-of-two PG counts, but that's the gist!) IIRC one of those halves maintains the same hash placements as the "parent" PG did so it won't get moved; the other might or might not get moved (I don't remember how often we expect that to happen). -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com