Re: computing PG IDs

Gregory Farnum <greg@xxxxxxxxxxx> · Thu, 7 Nov 2013 20:21:47 -0800

On Thu, Nov 7, 2013 at 6:43 AM, Kenneth Waegeman
<Kenneth.Waegeman@xxxxxxxx> wrote:
> Hi everyone,
>
> I just started to look at the documentation of Ceph and I've hit something I
> don't understand.
> It's about something on http://ceph.com/docs/master/architecture/
>
> """
> use the following steps to compute PG IDs.
>
> The client inputs the pool ID and the object ID. (e.g., pool = ?liverpool?
> and object-id = ?john?)
> CRUSH takes the object ID and hashes it.
> --> CRUSH calculates the hash modulo the number of OSDs. (e.g., 0x58) to get
> a PG ID.  <---
> CRUSH gets the pool ID given the pool name (e.g., ?liverpool? = 4)
> CRUSH prepends the pool ID to the PG ID (e.g., 4.0x58).
> """
>
> Shouldn't this be 'CRUSH calculates the hash modulo the the number of PGs to
> get a PG ID' ?

Yes!

> But then what happens if you add more PGs to the pool? Then most of the data
> will be reallocated to another PG?
Indeed. What actually happens is that as you increase the number of
PGs, the new ones are formed by splitting apart existing PGs — we
basically do this by picking the PG for an object based on the N least
significant bits of the hash value, and then if the number of PGs
doubles we use the N+1 least significant bits, so each PG is split
exactly in half. (It's a little more complicated than that to deal
with non-power-of-two PG counts, but that's the gist!) IIRC one of
those halves maintains the same hash placements as the "parent" PG did
so it won't get moved; the other might or might not get moved (I don't
remember how often we expect that to happen).
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com