Re: [RFCv5 2/2] drm/i915: Introduce private PAT management

"Wang, Zhi A" <zhi.a.wang@xxxxxxxxx> · Tue, 29 Aug 2017 17:54:51 +0000

Another finding during the re-factoring are:

a)It looks like that the PPAT_CACHE_INDEX on BDW/SKL is mapped to:
GEN8_PPAT_WB | GEN8_PPAT_LLCELLC | GEN8_PPAT_AGE(0);

But the PPAT_CACHE_INDEX on CNL is mapped to 
GEN8_PPAT_LLCELLC | GEN8_PPAT_AGE(0);

GEN8_PPAT_WB is missing here and by default the cache attribute is UC.

Is this set intentionally?

b) Looks like all the ages of PPAT in windows driver is AGE(3) because of some performance gains, is there any reason that i915 has to set it to AGE(0)?

Thanks,
Zhi.

-----Original Message-----
From: Wang, Zhi A 
Sent: Tuesday, August 29, 2017 2:19 PM
To: 'Joonas Lahtinen' <joonas.lahtinen@xxxxxxxxxxxxxxx>; intel-gfx@xxxxxxxxxxxxxxxxxxxxx; intel-gvt-dev@xxxxxxxxxxxxxxxxxxxxx
Cc: chris@xxxxxxxxxxxxxxxxxx; zhenyuw@xxxxxxxxxxxxxxx; Widawsky, Benjamin <benjamin.widawsky@xxxxxxxxx>; Vivi, Rodrigo <rodrigo.vivi@xxxxxxxxx>
Subject: RE: [RFCv5 2/2] drm/i915: Introduce private PAT management

Thanks Joonas! :)

-----Original Message-----
From: Joonas Lahtinen [mailto:joonas.lahtinen@xxxxxxxxxxxxxxx]
Sent: Tuesday, August 29, 2017 12:37 PM
To: Wang, Zhi A <zhi.a.wang@xxxxxxxxx>; intel-gfx@xxxxxxxxxxxxxxxxxxxxx; intel-gvt-dev@xxxxxxxxxxxxxxxxxxxxx
Cc: chris@xxxxxxxxxxxxxxxxxx; zhenyuw@xxxxxxxxxxxxxxx; Widawsky, Benjamin <benjamin.widawsky@xxxxxxxxx>; Vivi, Rodrigo <rodrigo.vivi@xxxxxxxxx>
Subject: Re: [RFCv5 2/2] drm/i915: Introduce private PAT management

On Tue, 2017-08-29 at 16:00 +0800, Zhi Wang wrote:
> The private PAT management is to support PPAT entry manipulation. Two 
> APIs are introduced for dynamically managing PPAT entries:
> intel_ppat_get and intel_ppat_put.
> 
> intel_ppat_get will search for an existing PPAT entry which perfectly 
> matches the required PPAT value. If not, it will try to allocate or 
> return a partially matched PPAT entry if there is any available PPAT 
> indexes or not.
> 
> intel_ppat_put will put back the PPAT entry which comes from 
> intel_ppat_get. If it's dynamically allocated, the reference count 
> will be decreased. If the reference count turns into zero, the PPAT 
> index is freed again.
> 
> Besides, another two callbacks are introduced to support the private 
> PAT management framework. One is ppat->update(), which writes the PPAT 
> configurations in ppat->entries into HW. Another one is ppat->match, 
> which will return a score to show how two PPAT values match with each other.
> 
> v5:
> 
> - Add check and warnnings for those platforms which don't have PPAT.
> 
> v3:
> 
> - Introduce dirty bitmap for PPAT registers. (Chris)
> - Change the name of the pointer "dev_priv" to "i915". (Chris)
> - intel_ppat_{get, put} returns/takes a const intel_ppat_entry *. 
> (Chris)
> 
> v2:
> 
> - API re-design. (Chris)
> 
> Cc: Ben Widawsky <benjamin.widawsky@xxxxxxxxx>
> Cc: Rodrigo Vivi <rodrigo.vivi@xxxxxxxxx>
> Cc: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
> Cc: Joonas Lahtinen <joonas.lahtinen@xxxxxxxxxxxxxxx>
> Signed-off-by: Zhi Wang <zhi.a.wang@xxxxxxxxx>

<SNIP>

> -static void cnl_setup_private_ppat(struct drm_i915_private *dev_priv)
> +static struct intel_ppat_entry *alloc_ppat_entry(struct intel_ppat *ppat,
> +						 unsigned int index,
> +						 u8 value)
>  {
> +	struct intel_ppat_entry *entry = &ppat->entries[index];
> +
> +	entry->ppat = ppat;
> +	entry->value = value;
> +	kref_init(&entry->ref_count);
> +	set_bit(index, ppat->used);
> +	set_bit(index, ppat->dirty);
> +
> +	return entry;
> +}
> +
> +static void free_ppat_entry(struct intel_ppat_entry *entry) {
> +	struct intel_ppat *ppat = entry->ppat;
> +	int index = entry - ppat->entries;
> +
> +	entry->value = ppat->dummy_value;
> +	clear_bit(index, ppat->used);
> +	set_bit(index, ppat->dirty);
> +}

Above functions should be __ prefixed as they do no checking if they override existing data. The suitable names might be __ppat_get and __ppat_put.

> +
> +/**
> + * intel_ppat_get - get a usable PPAT entry
> + * @i915: i915 device instance
> + * @value: the PPAT value required by the caller
> + *
> + * The function tries to search if there is an existing PPAT entry 
> +which
> + * matches with the required value. If perfectly matched, the 
> +existing PPAT
> + * entry will be used. If only partially matched, it will try to 
> +check if
> + * there is any available PPAT index. If yes, it will allocate a new 
> +PPAT
> + * index for the required entry and update the HW. If not, the 
> +partially
> + * matched entry will be used.
> + */
> +const struct intel_ppat_entry *intel_ppat_get(struct drm_i915_private 
> +*i915,

Maybe split the function type and name to avoid exceeding 80 chars on next line.

> +					      u8 value)
> +{
> +	struct intel_ppat *ppat = &i915->ppat;
> +	struct intel_ppat_entry *entry;
> +	int i, used;
> +	unsigned int score, best_score;
> +
> +	if (WARN_ON(!ppat->max_entries))
> +		return ERR_PTR(-ENODEV);

No need for extra check like this, this will just lead to ENOSPC.

> +
> +	score = best_score = 0;
> +	used = 0;

This variable behaves more like scanned.

> +
> +	/* First, find a suitable value from available entries */

The next two lines repeat this information, no need to document "what".

> +	for_each_set_bit(i, ppat->used, ppat->max_entries) {

		score could be declared in this scope.

> +		score = ppat->match(ppat->entries[i].value, value);
> +		/* Perfect match */

This comment is literally repeating what code says.

> +		if (score == INTEL_PPAT_PERFECT_MATCH) {
> +			entry = &ppat->entries[i];
> +			kref_get(&entry->ref_count);
> +			return entry;
> +		}
> +
> +		if (score > best_score) {
> +			entry = &ppat->entries[i];

Above could be simplified:

			if (score == INTEL_PPAT_PERFECT_MATCH) {
				kref_get(&entry->ref);
				return entry;
			}

> +			best_score = score;
> +		}
> +		used++;
> +	}
> +
> +	/* No matched entry and we can't allocate a new entry. */

DRM_ERROR replicates this comment's information.

> +	if (!best_score && used == ppat->max_entries) {
> +		DRM_ERROR("Fail to get PPAT entry\n");

DRM_DEBUG_DRIVER at most.

> +		return ERR_PTR(-ENOSPC);
> +	}
> +
> +	/*
> +	 * Found a matched entry which is not perfect,
> +	 * and we can't allocate a new entry.
> +	 */
> +	if (best_score && used == ppat->max_entries) {
> +		kref_get(&entry->ref_count);
> +		return entry;
> +	}

Above code could be combined:

	if (scanned == ppat->max_entries) {
		if(!best_score)
			return ERR_PTR(-ENOSPC);
		kref_get(&entry->ref);
		return entry;
	}

> +
> +	/* Allocate a new entry */

This comment is also just telling "what", which we can see from code.

> +	i = find_first_zero_bit(ppat->used, ppat->max_entries);
> +	entry = alloc_ppat_entry(ppat, i, value);
> +	ppat->update(i915);
> +	return entry;
> +}
> +
> +static void put_ppat(struct kref *kref)

ppat_release might cause less confusion, otherwise there will be 3 put functions.

> +{
> +	struct intel_ppat_entry *entry =
> +		container_of(kref, struct intel_ppat_entry, ref_count);
> +	struct drm_i915_private *i915 = entry->ppat->i915;
> +
> +	free_ppat_entry(entry);
> +	entry->ppat->update(i915);
> +}
> +
> +/**
> + * intel_ppat_put - put back the PPAT entry got from intel_ppat_get()
> + * @entry: an intel PPAT entry
> + *
> + * Put back the PPAT entry got from intel_ppat_get(). If the PPAT 
> +index of the
> + * entry is dynamically allocated, its reference count will be 
> +decreased. Once
> + * the reference count becomes into zero, the PPAT index becomes free again.
> + */
> +void intel_ppat_put(const struct intel_ppat_entry *entry) {
> +	struct intel_ppat *ppat = entry->ppat;
> +	int index = entry - ppat->entries;
> +
> +	if (WARN_ON(!ppat->max_entries))
> +		return;

This is clearly a kernel programmer error (and a serious one, so could be GEM_BUG_ON).

> +
> +	kref_put(&ppat->entries[index].ref_count, put_ppat); }
> +
> +static void cnl_private_pat_update(struct drm_i915_private *dev_priv) 
> +{
> +	struct intel_ppat *ppat = &dev_priv->ppat;
> +	int i;
> +
> +	for_each_set_bit(i, ppat->dirty, ppat->max_entries) {
> +		clear_bit(i, ppat->dirty);
> +		I915_WRITE(GEN10_PAT_INDEX(i), ppat->entries[i].value);

Clearing the bit after write is the logical thing to do.

> +	}
> +}
> +
> +static void bdw_private_pat_update(struct drm_i915_private *dev_priv) 
> +{
> +	struct intel_ppat *ppat = &dev_priv->ppat;
> +	u64 pat = 0;
> +	int i;
> +
> +	for (i = 0; i < ppat->max_entries; i++)
> +		pat |= GEN8_PPAT(i, ppat->entries[i].value);
> +
> +	bitmap_clear(ppat->dirty, 0, ppat->max_entries);
> +
> +	I915_WRITE(GEN8_PRIVATE_PAT_LO, pat);
> +	I915_WRITE(GEN8_PRIVATE_PAT_HI, pat >> 32);

While moving the code, lower_32_bits and upper_32_bits, it should really warn without them?

> +}
> +
> +#define gen8_pat_ca(v) ((v) & 0x3)
> +#define gen8_pat_tc(v) (((v) >> 2) & 0x3) #define gen8_pat_age(v)
> +(((v) >> 4) & 0x3)

Magic numbers. I'm also not 100% sure if KMD will want partial matches, or if GVT should pass in a scoring function.

> +
> +static unsigned int bdw_private_pat_match(u8 src, u8 dst) {
> +	unsigned int score = 0;
> +
> +	/* Cache attribute has to be matched. */
> +	if (gen8_pat_ca(src) != gen8_pat_ca(dst))
> +		return 0;
> +
> +	if (gen8_pat_age(src) == gen8_pat_age(dst))
> +		score += 1;
> +
> +	if (gen8_pat_tc(src) == gen8_pat_tc(dst))
> +		score += 2;

I'd lift this check to have them all in importance order.

> +
> +	if (score == 3)
> +		return INTEL_PPAT_PERFECT_MATCH;
> +
> +	return score;
> +}
> +
> +#define chv_get_snoop(v) (((v) >> 6) & 0x1)

Magic numbers, which need to be in i915_reg.h with #define. Should also be chv_pat_snoop();

> +
> +static unsigned int chv_private_pat_match(u8 src, u8 dst) {
> +	if (chv_get_snoop(src) == chv_get_snoop(dst))
> +		return INTEL_PPAT_PERFECT_MATCH;
> +
> +	return 0;

	return chv_pat_snoop(src) == chv_pat_snoop(dst) ?
	       INTEL_PPAT_PERFECT_MATCH : 0;

> +}
> +
> +static void cnl_setup_private_ppat(struct intel_ppat *ppat) {
> +	ppat->max_entries = 8;
> +	ppat->update = cnl_private_pat_update;
> +	ppat->match = bdw_private_pat_match;
> +	ppat->dummy_value = GEN8_PPAT_LLCELLC | GEN8_PPAT_AGE(3);

"dummy_value" is not a very descriptive name.

> +
>  	/* XXX: spec is unclear if this is still needed for CNL+ */
> -	if (!USES_PPGTT(dev_priv)) {
> -		I915_WRITE(GEN10_PAT_INDEX(0), GEN8_PPAT_UC);
> +	if (!USES_PPGTT(ppat->i915)) {
> +		alloc_ppat_entry(ppat, 0, GEN8_PPAT_UC);
>  		return;
>  	}
>  
> -	I915_WRITE(GEN10_PAT_INDEX(0), GEN8_PPAT_WB | GEN8_PPAT_LLC);
> -	I915_WRITE(GEN10_PAT_INDEX(1), GEN8_PPAT_WC | GEN8_PPAT_LLCELLC);
> -	I915_WRITE(GEN10_PAT_INDEX(2), GEN8_PPAT_WT | GEN8_PPAT_LLCELLC);
> -	I915_WRITE(GEN10_PAT_INDEX(3), GEN8_PPAT_UC);
> -	I915_WRITE(GEN10_PAT_INDEX(4), GEN8_PPAT_LLCELLC | GEN8_PPAT_AGE(0));
> -	I915_WRITE(GEN10_PAT_INDEX(5), GEN8_PPAT_LLCELLC | GEN8_PPAT_AGE(1));
> -	I915_WRITE(GEN10_PAT_INDEX(6), GEN8_PPAT_LLCELLC | GEN8_PPAT_AGE(2));
> -	I915_WRITE(GEN10_PAT_INDEX(7), GEN8_PPAT_LLCELLC | GEN8_PPAT_AGE(3));
> +	alloc_ppat_entry(ppat, 0, GEN8_PPAT_WB | GEN8_PPAT_LLC);
> +	alloc_ppat_entry(ppat, 2, GEN8_PPAT_WT | GEN8_PPAT_LLCELLC);
> +	alloc_ppat_entry(ppat, 3, GEN8_PPAT_UC);
> +	alloc_ppat_entry(ppat, 4, GEN8_PPAT_LLCELLC | GEN8_PPAT_AGE(0));

Maybe I'm not following at all, but this seems like quite a change to me? 0-3 used to be WB, WC, WT, UC, now 1 is unset, like is 5-7.

I'd leave all changes to registers to a later patch and leave this patch to do what the title says, "Introduce private PAT management".

>  static void setup_private_pat(struct drm_i915_private *dev_priv)  {
> +	struct intel_ppat *ppat = &dev_priv->ppat;
> +	int i;
> +
> +	ppat->i915 = dev_priv;
> +
> +	/* Load per-platform PPAT configurations */

This comment is again just taking space.

>  	if (INTEL_GEN(dev_priv) >= 10)
> -		cnl_setup_private_ppat(dev_priv);
> +		cnl_setup_private_ppat(ppat);
>  	else if (IS_CHERRYVIEW(dev_priv) || IS_GEN9_LP(dev_priv))
> -		chv_setup_private_ppat(dev_priv);
> +		chv_setup_private_ppat(ppat);
>  	else
> -		bdw_setup_private_ppat(dev_priv);
> +		bdw_setup_private_ppat(ppat);
> +
> +	GEM_BUG_ON(ppat->max_entries > INTEL_MAX_PPAT_ENTRIES);
> +
> +	if (!ppat->max_entries)
> +		return;

I'm not sure why we have "!max_entries" checks going, we should have one GEM_BUG_ON at this point and in the rest you can assume it to be nonzero.

> +
> +	/* Fill unused PPAT entries with dummy PPAT value */
> +	for_each_clear_bit(i, ppat->used, ppat->max_entries) {
> +		ppat->entries[i].value = ppat->dummy_value;
> +		ppat->entries[i].ppat = ppat;
> +		set_bit(i, ppat->dirty);
> +	}
> +
> +	/* Write the HW */

If the callback was named update_hw(), this comment would be unnecessary.

> +	ppat->update(dev_priv);
>  }

<SNIP>

> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
> @@ -536,6 +536,40 @@ i915_vm_to_ggtt(struct i915_address_space *vm)
>  	return container_of(vm, struct i915_ggtt, base);  }
>  
> +#define INTEL_MAX_PPAT_ENTRIES 8
> +#define INTEL_PPAT_PERFECT_MATCH (~0U)
> +
> +struct intel_ppat;
> +
> +struct intel_ppat_entry {
> +	struct intel_ppat *ppat;
> +	struct kref ref_count;

	Just "ref", like elsewhere in code.

> +	u8 value;
> +};
> +
> +struct intel_ppat {
> +	struct intel_ppat_entry entries[INTEL_MAX_PPAT_ENTRIES];
> +	DECLARE_BITMAP(used, INTEL_MAX_PPAT_ENTRIES);
> +	DECLARE_BITMAP(dirty, INTEL_MAX_PPAT_ENTRIES);
> +

This really goes with the previous group of variable. So no newline.

> +	unsigned int max_entries;
> +
> +	u8 dummy_value;

Better name, like "clear_value" and short description may be useful.

> +	/*
> +	 * Return a score to show how two PPAT values match,
> +	 * a INTEL_PPAT_PERFECT_MATCH indicates a perfect match
> +	 */
> +	unsigned int (*match)(u8 src, u8 dst);
> +	/* Write the PPAT configuration into HW. */
> +	void (*update)(struct drm_i915_private *i915);
> +
> +	struct drm_i915_private *i915;
> +};
> +
> +const struct intel_ppat_entry *intel_ppat_get(struct drm_i915_private 
> +*i915,

Same here with type vs name.

> +					      u8 value);
> +void intel_ppat_put(const struct intel_ppat_entry *entry);

Regards, joonas
--
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx