Re: [PATCH 31/33] libceph: add support for osd primary affinity

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 03/27/2014 01:18 PM, Ilya Dryomov wrote:
> Respond to non-default primary_affinity values accordingly.  (Primary
> affinity allows the admin to shift 'primary responsibility' away from
> specific osds, effectively shifting around the read side of the
> workload and whatever overhead is incurred by peering and writes by
> virtue of being the primary).

The code looks good, I presume it matches the algorithm.
I have a few questions below but nothing serious.

Reviewed-by: Alex Elder <elder@xxxxxxxxxx>

> 
> Signed-off-by: Ilya Dryomov <ilya.dryomov@xxxxxxxxxxx>
> ---
>  net/ceph/osdmap.c |   68 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 68 insertions(+)
> 
> diff --git a/net/ceph/osdmap.c b/net/ceph/osdmap.c
> index ed52b47d0ddb..8c596a13c60f 100644
> --- a/net/ceph/osdmap.c
> +++ b/net/ceph/osdmap.c
> @@ -1589,6 +1589,72 @@ static int raw_to_up_osds(struct ceph_osdmap *osdmap,
>  	return len;
>  }
>  
> +static void apply_primary_affinity(struct ceph_osdmap *osdmap, u32 pps,
> +				   struct ceph_pg_pool_info *pool,
> +				   int *osds, int len, int *primary)
> +{
> +	int i;
> +	int pos = -1;
> +
> +	/*
> +	 * Do we have any non-default primary_affinity values for these
> +	 * osds?
> +	 */
> +	if (!osdmap->osd_primary_affinity)
> +		return;
> +
> +	for (i = 0; i < len; i++) {
> +		if (osds[i] != CRUSH_ITEM_NONE &&
> +		    osdmap->osd_primary_affinity[i] !=
> +					CEPH_OSD_DEFAULT_PRIMARY_AFFINITY) {
> +			break;
> +		}
> +	}
> +	if (i == len)
> +		return;

So if they're all DEFAULT_AFFINITY they you don't bother.

I'm trying to understand what happens if at least one is
DEFAULT and at least one is not DEFAULT.

> +
> +	/*
> +	 * Pick the primary.  Feed both the seed (for the pg) and the
> +	 * osd into the hash/rng so that a proportional fraction of an
> +	 * osd's pgs get rejected as primary.
> +	 */
> +	for (i = 0; i < len; i++) {
> +		int o;
> +		u32 a;

Maybe "osd" and "aff" for osd number and affinity values?

> +
> +		o = osds[i];
> +		if (o == CRUSH_ITEM_NONE)
> +			continue;
> +
> +		a = osdmap->osd_primary_affinity[o];
> +		if (a < CEPH_OSD_MAX_PRIMARY_AFFINITY &&

So CEPH_OSD_MAX_PRIMARY_AFFINITY is actually one more than
the maximum allowed value, right?

> +		    (crush_hash32_2(CRUSH_HASH_RJENKINS1,
> +				    pps, o) >> 16) >= a) {
> +			/*
> +			 * We chose not to use this primary.  Note it
> +			 * anyway as a fallback in case we don't pick
> +			 * anyone else, but keep looking.
> +			 */
> +			if (pos < 0)
> +				pos = i;
> +		} else {
> +			pos = i;
> +			break;
> +		}
> +	}
> +	if (pos < 0)
> +		return;
> +
> +	*primary = osds[pos];
> +
> +	if (ceph_can_shift_osds(pool) && pos > 0) {
> +		/* move the new primary to the front */
> +		for (i = pos; i > 0; i--)
> +			osds[i] = osds[i - 1];
> +		osds[0] = *primary;
> +	}

So the first one *is* the primary, you just renumber them.
I see.

> +}
> +
>  /*
>   * Given up set, apply pg_temp and primary_temp mappings.
>   *
> @@ -1691,6 +1757,8 @@ int ceph_calc_pg_acting(struct ceph_osdmap *osdmap, struct ceph_pg pgid,
>  
>  	len = raw_to_up_osds(osdmap, pool, osds, len, primary);
>  
> +	apply_primary_affinity(osdmap, pps, pool, osds, len, primary);
> +
>  	len = apply_temps(osdmap, pool, pgid, osds, len, primary);
>  
>  	return len;
> 

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux