Thanks Joonas! :) -----Original Message----- From: Joonas Lahtinen [mailto:joonas.lahtinen@xxxxxxxxxxxxxxx] Sent: Tuesday, August 29, 2017 12:37 PM To: Wang, Zhi A <zhi.a.wang@xxxxxxxxx>; intel-gfx@xxxxxxxxxxxxxxxxxxxxx; intel-gvt-dev@xxxxxxxxxxxxxxxxxxxxx Cc: chris@xxxxxxxxxxxxxxxxxx; zhenyuw@xxxxxxxxxxxxxxx; Widawsky, Benjamin <benjamin.widawsky@xxxxxxxxx>; Vivi, Rodrigo <rodrigo.vivi@xxxxxxxxx> Subject: Re: [RFCv5 2/2] drm/i915: Introduce private PAT management On Tue, 2017-08-29 at 16:00 +0800, Zhi Wang wrote: > The private PAT management is to support PPAT entry manipulation. Two > APIs are introduced for dynamically managing PPAT entries: > intel_ppat_get and intel_ppat_put. > > intel_ppat_get will search for an existing PPAT entry which perfectly > matches the required PPAT value. If not, it will try to allocate or > return a partially matched PPAT entry if there is any available PPAT > indexes or not. > > intel_ppat_put will put back the PPAT entry which comes from > intel_ppat_get. If it's dynamically allocated, the reference count > will be decreased. If the reference count turns into zero, the PPAT > index is freed again. > > Besides, another two callbacks are introduced to support the private > PAT management framework. One is ppat->update(), which writes the PPAT > configurations in ppat->entries into HW. Another one is ppat->match, > which will return a score to show how two PPAT values match with each other. > > v5: > > - Add check and warnnings for those platforms which don't have PPAT. > > v3: > > - Introduce dirty bitmap for PPAT registers. (Chris) > - Change the name of the pointer "dev_priv" to "i915". (Chris) > - intel_ppat_{get, put} returns/takes a const intel_ppat_entry *. > (Chris) > > v2: > > - API re-design. (Chris) > > Cc: Ben Widawsky <benjamin.widawsky@xxxxxxxxx> > Cc: Rodrigo Vivi <rodrigo.vivi@xxxxxxxxx> > Cc: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > Cc: Joonas Lahtinen <joonas.lahtinen@xxxxxxxxxxxxxxx> > Signed-off-by: Zhi Wang <zhi.a.wang@xxxxxxxxx> <SNIP> > -static void cnl_setup_private_ppat(struct drm_i915_private *dev_priv) > +static struct intel_ppat_entry *alloc_ppat_entry(struct intel_ppat *ppat, > + unsigned int index, > + u8 value) > { > + struct intel_ppat_entry *entry = &ppat->entries[index]; > + > + entry->ppat = ppat; > + entry->value = value; > + kref_init(&entry->ref_count); > + set_bit(index, ppat->used); > + set_bit(index, ppat->dirty); > + > + return entry; > +} > + > +static void free_ppat_entry(struct intel_ppat_entry *entry) { > + struct intel_ppat *ppat = entry->ppat; > + int index = entry - ppat->entries; > + > + entry->value = ppat->dummy_value; > + clear_bit(index, ppat->used); > + set_bit(index, ppat->dirty); > +} Above functions should be __ prefixed as they do no checking if they override existing data. The suitable names might be __ppat_get and __ppat_put. > + > +/** > + * intel_ppat_get - get a usable PPAT entry > + * @i915: i915 device instance > + * @value: the PPAT value required by the caller > + * > + * The function tries to search if there is an existing PPAT entry > +which > + * matches with the required value. If perfectly matched, the > +existing PPAT > + * entry will be used. If only partially matched, it will try to > +check if > + * there is any available PPAT index. If yes, it will allocate a new > +PPAT > + * index for the required entry and update the HW. If not, the > +partially > + * matched entry will be used. > + */ > +const struct intel_ppat_entry *intel_ppat_get(struct drm_i915_private > +*i915, Maybe split the function type and name to avoid exceeding 80 chars on next line. > + u8 value) > +{ > + struct intel_ppat *ppat = &i915->ppat; > + struct intel_ppat_entry *entry; > + int i, used; > + unsigned int score, best_score; > + > + if (WARN_ON(!ppat->max_entries)) > + return ERR_PTR(-ENODEV); No need for extra check like this, this will just lead to ENOSPC. > + > + score = best_score = 0; > + used = 0; This variable behaves more like scanned. > + > + /* First, find a suitable value from available entries */ The next two lines repeat this information, no need to document "what". > + for_each_set_bit(i, ppat->used, ppat->max_entries) { score could be declared in this scope. > + score = ppat->match(ppat->entries[i].value, value); > + /* Perfect match */ This comment is literally repeating what code says. > + if (score == INTEL_PPAT_PERFECT_MATCH) { > + entry = &ppat->entries[i]; > + kref_get(&entry->ref_count); > + return entry; > + } > + > + if (score > best_score) { > + entry = &ppat->entries[i]; Above could be simplified: if (score == INTEL_PPAT_PERFECT_MATCH) { kref_get(&entry->ref); return entry; } > + best_score = score; > + } > + used++; > + } > + > + /* No matched entry and we can't allocate a new entry. */ DRM_ERROR replicates this comment's information. > + if (!best_score && used == ppat->max_entries) { > + DRM_ERROR("Fail to get PPAT entry\n"); DRM_DEBUG_DRIVER at most. > + return ERR_PTR(-ENOSPC); > + } > + > + /* > + * Found a matched entry which is not perfect, > + * and we can't allocate a new entry. > + */ > + if (best_score && used == ppat->max_entries) { > + kref_get(&entry->ref_count); > + return entry; > + } Above code could be combined: if (scanned == ppat->max_entries) { if(!best_score) return ERR_PTR(-ENOSPC); kref_get(&entry->ref); return entry; } > + > + /* Allocate a new entry */ This comment is also just telling "what", which we can see from code. > + i = find_first_zero_bit(ppat->used, ppat->max_entries); > + entry = alloc_ppat_entry(ppat, i, value); > + ppat->update(i915); > + return entry; > +} > + > +static void put_ppat(struct kref *kref) ppat_release might cause less confusion, otherwise there will be 3 put functions. > +{ > + struct intel_ppat_entry *entry = > + container_of(kref, struct intel_ppat_entry, ref_count); > + struct drm_i915_private *i915 = entry->ppat->i915; > + > + free_ppat_entry(entry); > + entry->ppat->update(i915); > +} > + > +/** > + * intel_ppat_put - put back the PPAT entry got from intel_ppat_get() > + * @entry: an intel PPAT entry > + * > + * Put back the PPAT entry got from intel_ppat_get(). If the PPAT > +index of the > + * entry is dynamically allocated, its reference count will be > +decreased. Once > + * the reference count becomes into zero, the PPAT index becomes free again. > + */ > +void intel_ppat_put(const struct intel_ppat_entry *entry) { > + struct intel_ppat *ppat = entry->ppat; > + int index = entry - ppat->entries; > + > + if (WARN_ON(!ppat->max_entries)) > + return; This is clearly a kernel programmer error (and a serious one, so could be GEM_BUG_ON). > + > + kref_put(&ppat->entries[index].ref_count, put_ppat); } > + > +static void cnl_private_pat_update(struct drm_i915_private *dev_priv) > +{ > + struct intel_ppat *ppat = &dev_priv->ppat; > + int i; > + > + for_each_set_bit(i, ppat->dirty, ppat->max_entries) { > + clear_bit(i, ppat->dirty); > + I915_WRITE(GEN10_PAT_INDEX(i), ppat->entries[i].value); Clearing the bit after write is the logical thing to do. > + } > +} > + > +static void bdw_private_pat_update(struct drm_i915_private *dev_priv) > +{ > + struct intel_ppat *ppat = &dev_priv->ppat; > + u64 pat = 0; > + int i; > + > + for (i = 0; i < ppat->max_entries; i++) > + pat |= GEN8_PPAT(i, ppat->entries[i].value); > + > + bitmap_clear(ppat->dirty, 0, ppat->max_entries); > + > + I915_WRITE(GEN8_PRIVATE_PAT_LO, pat); > + I915_WRITE(GEN8_PRIVATE_PAT_HI, pat >> 32); While moving the code, lower_32_bits and upper_32_bits, it should really warn without them? > +} > + > +#define gen8_pat_ca(v) ((v) & 0x3) > +#define gen8_pat_tc(v) (((v) >> 2) & 0x3) #define gen8_pat_age(v) > +(((v) >> 4) & 0x3) Magic numbers. I'm also not 100% sure if KMD will want partial matches, or if GVT should pass in a scoring function. > + > +static unsigned int bdw_private_pat_match(u8 src, u8 dst) { > + unsigned int score = 0; > + > + /* Cache attribute has to be matched. */ > + if (gen8_pat_ca(src) != gen8_pat_ca(dst)) > + return 0; > + > + if (gen8_pat_age(src) == gen8_pat_age(dst)) > + score += 1; > + > + if (gen8_pat_tc(src) == gen8_pat_tc(dst)) > + score += 2; I'd lift this check to have them all in importance order. > + > + if (score == 3) > + return INTEL_PPAT_PERFECT_MATCH; > + > + return score; > +} > + > +#define chv_get_snoop(v) (((v) >> 6) & 0x1) Magic numbers, which need to be in i915_reg.h with #define. Should also be chv_pat_snoop(); > + > +static unsigned int chv_private_pat_match(u8 src, u8 dst) { > + if (chv_get_snoop(src) == chv_get_snoop(dst)) > + return INTEL_PPAT_PERFECT_MATCH; > + > + return 0; return chv_pat_snoop(src) == chv_pat_snoop(dst) ? INTEL_PPAT_PERFECT_MATCH : 0; > +} > + > +static void cnl_setup_private_ppat(struct intel_ppat *ppat) { > + ppat->max_entries = 8; > + ppat->update = cnl_private_pat_update; > + ppat->match = bdw_private_pat_match; > + ppat->dummy_value = GEN8_PPAT_LLCELLC | GEN8_PPAT_AGE(3); "dummy_value" is not a very descriptive name. > + > /* XXX: spec is unclear if this is still needed for CNL+ */ > - if (!USES_PPGTT(dev_priv)) { > - I915_WRITE(GEN10_PAT_INDEX(0), GEN8_PPAT_UC); > + if (!USES_PPGTT(ppat->i915)) { > + alloc_ppat_entry(ppat, 0, GEN8_PPAT_UC); > return; > } > > - I915_WRITE(GEN10_PAT_INDEX(0), GEN8_PPAT_WB | GEN8_PPAT_LLC); > - I915_WRITE(GEN10_PAT_INDEX(1), GEN8_PPAT_WC | GEN8_PPAT_LLCELLC); > - I915_WRITE(GEN10_PAT_INDEX(2), GEN8_PPAT_WT | GEN8_PPAT_LLCELLC); > - I915_WRITE(GEN10_PAT_INDEX(3), GEN8_PPAT_UC); > - I915_WRITE(GEN10_PAT_INDEX(4), GEN8_PPAT_LLCELLC | GEN8_PPAT_AGE(0)); > - I915_WRITE(GEN10_PAT_INDEX(5), GEN8_PPAT_LLCELLC | GEN8_PPAT_AGE(1)); > - I915_WRITE(GEN10_PAT_INDEX(6), GEN8_PPAT_LLCELLC | GEN8_PPAT_AGE(2)); > - I915_WRITE(GEN10_PAT_INDEX(7), GEN8_PPAT_LLCELLC | GEN8_PPAT_AGE(3)); > + alloc_ppat_entry(ppat, 0, GEN8_PPAT_WB | GEN8_PPAT_LLC); > + alloc_ppat_entry(ppat, 2, GEN8_PPAT_WT | GEN8_PPAT_LLCELLC); > + alloc_ppat_entry(ppat, 3, GEN8_PPAT_UC); > + alloc_ppat_entry(ppat, 4, GEN8_PPAT_LLCELLC | GEN8_PPAT_AGE(0)); Maybe I'm not following at all, but this seems like quite a change to me? 0-3 used to be WB, WC, WT, UC, now 1 is unset, like is 5-7. I'd leave all changes to registers to a later patch and leave this patch to do what the title says, "Introduce private PAT management". > static void setup_private_pat(struct drm_i915_private *dev_priv) { > + struct intel_ppat *ppat = &dev_priv->ppat; > + int i; > + > + ppat->i915 = dev_priv; > + > + /* Load per-platform PPAT configurations */ This comment is again just taking space. > if (INTEL_GEN(dev_priv) >= 10) > - cnl_setup_private_ppat(dev_priv); > + cnl_setup_private_ppat(ppat); > else if (IS_CHERRYVIEW(dev_priv) || IS_GEN9_LP(dev_priv)) > - chv_setup_private_ppat(dev_priv); > + chv_setup_private_ppat(ppat); > else > - bdw_setup_private_ppat(dev_priv); > + bdw_setup_private_ppat(ppat); > + > + GEM_BUG_ON(ppat->max_entries > INTEL_MAX_PPAT_ENTRIES); > + > + if (!ppat->max_entries) > + return; I'm not sure why we have "!max_entries" checks going, we should have one GEM_BUG_ON at this point and in the rest you can assume it to be nonzero. > + > + /* Fill unused PPAT entries with dummy PPAT value */ > + for_each_clear_bit(i, ppat->used, ppat->max_entries) { > + ppat->entries[i].value = ppat->dummy_value; > + ppat->entries[i].ppat = ppat; > + set_bit(i, ppat->dirty); > + } > + > + /* Write the HW */ If the callback was named update_hw(), this comment would be unnecessary. > + ppat->update(dev_priv); > } <SNIP> > +++ b/drivers/gpu/drm/i915/i915_gem_gtt.h > @@ -536,6 +536,40 @@ i915_vm_to_ggtt(struct i915_address_space *vm) > return container_of(vm, struct i915_ggtt, base); } > > +#define INTEL_MAX_PPAT_ENTRIES 8 > +#define INTEL_PPAT_PERFECT_MATCH (~0U) > + > +struct intel_ppat; > + > +struct intel_ppat_entry { > + struct intel_ppat *ppat; > + struct kref ref_count; Just "ref", like elsewhere in code. > + u8 value; > +}; > + > +struct intel_ppat { > + struct intel_ppat_entry entries[INTEL_MAX_PPAT_ENTRIES]; > + DECLARE_BITMAP(used, INTEL_MAX_PPAT_ENTRIES); > + DECLARE_BITMAP(dirty, INTEL_MAX_PPAT_ENTRIES); > + This really goes with the previous group of variable. So no newline. > + unsigned int max_entries; > + > + u8 dummy_value; Better name, like "clear_value" and short description may be useful. > + /* > + * Return a score to show how two PPAT values match, > + * a INTEL_PPAT_PERFECT_MATCH indicates a perfect match > + */ > + unsigned int (*match)(u8 src, u8 dst); > + /* Write the PPAT configuration into HW. */ > + void (*update)(struct drm_i915_private *i915); > + > + struct drm_i915_private *i915; > +}; > + > +const struct intel_ppat_entry *intel_ppat_get(struct drm_i915_private > +*i915, Same here with type vs name. > + u8 value); > +void intel_ppat_put(const struct intel_ppat_entry *entry); Regards, joonas -- Joonas Lahtinen Open Source Technology Center Intel Corporation _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx