Em Qui, 2016-10-13 às 16:28 +0530, Kumar, Mahesh escreveu: > This patch adds support to decode system memory bandwidth > which will be used for arbitrated display memory percentage > calculation in GEN9 based system. > > Changes from v1: > - Address comments from Paulo > - implement decode function for SKL/KBL also Again, my understanding of these registers is not good so I will ask some questions that will help me properly understand and review the code. I may also end up asking some questions that don't make sense. Please correct me in case any of my assumptions is wrong. Perhaps answers to my questions could become code comments since I suppose most of the gfx team is not super familiar with the memory regs, and we're going to have to maintain the code. > > Signed-off-by: "Kumar, Mahesh" <mahesh1.kumar@xxxxxxxxx> > --- > drivers/gpu/drm/i915/i915_drv.c | 170 > ++++++++++++++++++++++++++++++++++++++++ > drivers/gpu/drm/i915/i915_drv.h | 14 ++++ > drivers/gpu/drm/i915/i915_reg.h | 38 +++++++++ > 3 files changed, 222 insertions(+) > > diff --git a/drivers/gpu/drm/i915/i915_drv.c > b/drivers/gpu/drm/i915/i915_drv.c > index 89d3222..b5f601c 100644 > --- a/drivers/gpu/drm/i915/i915_drv.c > +++ b/drivers/gpu/drm/i915/i915_drv.c > @@ -979,6 +979,170 @@ static void intel_sanitize_options(struct > drm_i915_private *dev_priv) > DRM_DEBUG_DRIVER("use GPU sempahores? %s\n", > yesno(i915.semaphores)); > } > > +static inline void skl_memdev_info_fill(struct memdev_info *info, > uint32_t val) > +{ > + uint8_t channel_width, rank; > + > + info->rank_valid = true; > + channel_width = (val >> SKL_DRAM_CHANNEL_WIDTH_SHIFT) & > + SKL_DRAM_CHANNEL_WIDTH_MASK; Why are we looking at the L bits instead of the S bits? What's the difference between them? > + if (channel_width == SKL_DRAM_WIDTH_1) > + info->channel_width_bytes = 1; > + else if (channel_width == SKL_DRAM_WIDTH_2) > + info->channel_width_bytes = 2; > + else > + info->channel_width_bytes = 4; This is not checking for the invalid value of 0x3. Perhaps a switch() instead of an if() ladder would be better here. > + > + rank = (val >> SKL_DRAM_RANK_SHIFT) & SKL_DRAM_RANK_MASK; > + if (rank == SKL_DRAM_RANK_SINGLE) > + info->rank = DRAM_RANK_SINGLE; > + else > + info->rank = DRAM_RANK_DUAL; > +} > + > +static int > +skl_get_memdev_info(struct drm_i915_private *dev_priv) > +{ > + uint32_t val = 0; > + uint32_t mem_speed = 0; > + struct memdev_info *memdev_info = &dev_priv->memdev_info; > + > + val = I915_READ(SKL_MC_BIOS_DATA_0_0_0_MCHBAR_PCU); > + mem_speed = div_u64((uint64_t) (val & SKL_REQ_DATA_MASK) * > + SKL_MEMORY_FREQ_MULTIPLIER, 1000); But but but isn't it possible to have more than 2GHz in SKL? I think I have a big SKL machine with more than 2GHz here... I'll check it later. Also, don't we need to check FREQ_TYPE and ODD_RATIO? Some digging suggests that maybe we need to figure out somehow if we're using DCLK or QCLK and multiply accordingly (based on the BIOS_REQ register instead of BIOS_DATA). I'd really need some clarification on how all of this works. Also, it looks like there's no need for this to be a 64bit calculation due to the mask limit. > + > + if (mem_speed == 0) > + return -EINVAL; > + > + memdev_info->valid = true; > + memdev_info->mem_speed_khz = mem_speed; > + memdev_info->num_channels = 0; > + > + memdev_info->rank_valid = false; > + val = I915_READ(SKL_MAD_DIMM_CH0_0_0_0_MCHBAR_MCMAIN); > + > + if (val != 0xFFFFFFFF) { > + memdev_info->num_channels++; > + skl_memdev_info_fill(memdev_info, val); > + } > + > + val = I915_READ(SKL_MAD_DIMM_CH1_0_0_0_MCHBAR_MCMAIN); > + > + if (val != 0xFFFFFFFF) { > + memdev_info->num_channels++; > + if (!memdev_info->rank_valid) > + skl_memdev_info_fill(memdev_info, val); > + } A simple iteration over the two register addresses would have saved some code :). Also, isn't it possible to have channels with different width/rank? Shouldn't our code be sanity-checking this? Perhaps grab the width/rank for each channel and compare. When the specs are unclear or don't exist, it's probably better if our code is able to check/validate its own assumptions. > + > + if (memdev_info->num_channels == 0) > + memdev_info->valid = false; > + return 0; > +} > + > +static int > +bxt_get_memdev_info(struct drm_i915_private *dev_priv) > +{ > + struct memdev_info *memdev_info = &dev_priv->memdev_info; > + uint32_t dram_channel; > + uint32_t mem_speed, val; > + uint8_t num_channel, dram_type; > + int i; > + > + val = I915_READ(BXT_P_CR_MC_BIOS_REQ_0_0_0); > + mem_speed = div_u64((uint64_t) (val & BXT_REQ_DATA_MASK) * > + SKL_MEMORY_FREQ_MULTIPLIER, 1000); AFAIU, there's no need for 64 bit division since the maximum value is 8399790. > + > + if (mem_speed == 0) > + return -EINVAL; > + > + memdev_info->valid = true; > + memdev_info->mem_speed_khz = mem_speed; > + dram_type = (val >> BXT_DRAM_TYPE_SHIFT) & > BXT_DRAM_TYPE_MASK; > + dram_channel = (val >> BXT_DRAM_CHANNEL_SHIFT) & > BXT_DRAM_CHANNEL_MASK; > + num_channel = hweight32(dram_channel); > + > + /* > + * The lpddr3 and lpddr4 technologies can have 1-4 channels > and the > + * channels are 32bits wide; while ddr3l technologies can > have 1-2 > + * channels and the channels are 64 bits wide. In case of > single 64 bit > + * wide DDR3L dimm sets two channel-active bits in > + * P_CR_MC_BIOS_REQ_0_0_0 register and system with two DDR3L > 64bit dimm > + * will set all four channel-active bits in above register. > + * In order to get actual number of channels in DDR3L DRAM > we need to > + * device total channel-active bits set by 2. > + */ > + > + switch (dram_type) { > + case BXT_DRAM_TYPE_LPDDR3: > + case BXT_DRAM_TYPE_LPDDR4: > + memdev_info->channel_width_bytes = 4; > + memdev_info->num_channels = num_channel; > + break; > + case BXT_DRAM_TYPE_DDR3L: > + memdev_info->channel_width_bytes = 8; > + memdev_info->num_channels = num_channel / 2; > + break; > + default: > + DRM_DEBUG_KMS("Unknown DRAM type\n"); > + memdev_info->valid = false; > + return -EINVAL; > + } Shouldn't channel_width_bytes just be num_channels * 4 for LPDDR3/4 and num_channels * 8 for DDR3L? Or just multiply by 4 before we do the "num_channel / 2"... This would avoid even having to find out the DRAM type (in case it's not needed anywhere else). > + > + /* > + * Now read each DUNIT8/9/10/11 to check the rank of each > dimms. > + * all the dimms should have same rank as in first valid > Dimm > + */ I think we need to explain in the comment why we're iterating over the 4 dunits but just looking at the first one that's not 0xFFFFFFFF. Isn't there a way to discover which dunits to look at based on the number of channels and DRAM type? Also, does the valid dunit amount reflect the number of channels like in SKL? > + memdev_info->rank_valid = false; > + for (i = 0; i < BXT_D_CR_DRP0_DUNIT_MAX; i++) { > + val = I915_READ(BXT_D_CR_DRP0_DUNIT(i)); > + if (val != 0xFFFFFFFF) { > + uint8_t rank; > + > + memdev_info->rank_valid = true; > + rank = val & BXT_DRAM_RANK_MASK; > + if (rank == BXT_DRAM_RANK_SINGLE) > + memdev_info->rank = > DRAM_RANK_SINGLE; > + else if (rank == BXT_DRAM_RANK_DUAL) > + memdev_info->rank = DRAM_RANK_DUAL; > + else > + memdev_info->rank = DRAM_RANK_NONE; Probably DRM_DEBUG_KMS("something\n") here? Also maybe set rank_valid back to false? Return non-zero? > + > + break; > + } > + } > + return 0; > +} > + > +static void > +intel_get_memdev_info(struct drm_i915_private *dev_priv) > +{ > + struct memdev_info *memdev_info = &dev_priv->memdev_info; > + int ret; > + > + memdev_info->valid = false; > + if (!IS_GEN9(dev_priv)) > + return; > + > + if (IS_BROXTON(dev_priv)) > + ret = bxt_get_memdev_info(dev_priv); > + else > + ret = skl_get_memdev_info(dev_priv); > + if (ret) > + return; What I find a little hard to read is that we set memdev_info->valid both here and in the functions we call. Due to the early returns it's not super simple to see the possible value it may get, and things get even more complicated due to the fact that we may return 0 from skl_get_memdev_info() while still setting memdev_info->valid to false. Things would be much simpler and easier to read if we only set memdev_info->valid in this function, based on the return value of the get_memdev_info() functions. And then the get_memdev_info() functions would appropriately return zero/nonzero based on valid/invalid configuration. > + > + if (memdev_info->valid) { > + DRM_DEBUG_DRIVER("DRAM speed-%d Khz total-channels- > %d channel-width-%d bytes\n", > + memdev_info->mem_speed_khz, > + memdev_info->num_channels, > + memdev_info->channel_width_bytes); As I mentioned in my previous review, please use the appropriate specifiers here instead of %d for everybody. Also, we usually print "variable: value, other variable: value" instead of "variable-value other-variable-value". Please use the established convention. Reading "total-channels-1" is less clear than "total channels: 1" and doesn't raise ambiguity concerns with negative values. > + if (memdev_info->rank_valid) > + DRM_DEBUG_DRIVER("DRAM rank-%s\n", > + (memdev_info->rank == DRAM_RANK_DUAL) ? > + "dual" : "single"); But there's also DRAM_RANK_NONE. > + } > +} > + > + > /** > * i915_driver_init_hw - setup state requiring device access > * @dev_priv: device private > @@ -1082,6 +1246,12 @@ static int i915_driver_init_hw(struct > drm_i915_private *dev_priv) > DRM_DEBUG_DRIVER("can't enable MSI"); > } > > + /* > + * Fill the memdev structure to get the system raw bandwidth > + * This will be used by WM algorithm, to implement GEN9 > based WA > + */ > + intel_get_memdev_info(dev_priv); > + > return 0; > > out_ggtt: > diff --git a/drivers/gpu/drm/i915/i915_drv.h > b/drivers/gpu/drm/i915/i915_drv.h > index a219a35..adbd9aa 100644 > --- a/drivers/gpu/drm/i915/i915_drv.h > +++ b/drivers/gpu/drm/i915/i915_drv.h > @@ -2057,6 +2057,20 @@ struct drm_i915_private { > bool distrust_bios_wm; > } wm; > > + struct memdev_info { > + bool valid; > + uint32_t mem_speed_khz; > + uint8_t channel_width_bytes; > + uint8_t num_channels; > + bool rank_valid; > + enum { > + DRAM_RANK_NONE = 0, > + DRAM_RANK_SINGLE, > + DRAM_RANK_DUAL > + } rank; What's the meaning of DRAM_RANK_NONE? Does DRAM_RANK_NONE imply rank_valid=false? What's the meaning of rank=DRAM_RANK_NONE when rank_valid=true? Perhaps we could get rid of the rank_valid variable? Maybe rename DRAM_RANK_NONE to DRAM_RANK_INVALID? > + } memdev_info; > + > + > struct i915_runtime_pm pm; > > /* Abstract the submission mechanism (legacy ringbuffer or > execlists) away */ > diff --git a/drivers/gpu/drm/i915/i915_reg.h > b/drivers/gpu/drm/i915/i915_reg.h > index acc767a..a9c467c 100644 > --- a/drivers/gpu/drm/i915/i915_reg.h > +++ b/drivers/gpu/drm/i915/i915_reg.h > @@ -7721,6 +7721,44 @@ enum { > #define DC_STATE_DEBUG_MASK_CORES (1<<0) > #define DC_STATE_DEBUG_MASK_MEMORY_UP (1<<1) > > +#define _MMIO_MCHBAR_PIPE(x, a, b) _MMIO(MCHBAR_MIRROR_BASE_S > NB + _PIPE(x, a, b)) Wait, what? I may have misunderstood something, but AFAICS the registers that use this macro are not per-pipe, so using the _PIPE macro makes things super confusing. In this case it should be fine to just do something a little more hardcoded, like: #define BXT_D_CR_DRP0_DUNIT(unit) _MMIO(MCHBAR_MIRROR_BASE_SNB + (unit) * 0x200) > +#define BXT_P_CR_MC_BIOS_REQ_0_0_0 _MMIO(MCHBAR_MIRROR_BASE_S > NB + 0x7114) > +#define BXT_REQ_DATA_MASK (0x3F << 0) Or just 0x3F to be consistent with the other definitions. > +#define BXT_DRAM_TYPE_SHIFT 24 > +#define BXT_DRAM_TYPE_MASK 0x7 The spec you sent me lists these bits as reserved. Are you sure these come from this register? > +#define BXT_DRAM_CHANNEL_SHIFT 12 > +#define BXT_DRAM_CHANNEL_MASK 0xF I'd rename these to BXT_DRAM_CHANNEL_ACTIVE_{SHIFT,MASK}. > + > +#define BXT_DRAM_TYPE_LPDDR3 0x1 > +#define BXT_DRAM_TYPE_LPDDR4 0x2 > +#define BXT_DRAM_TYPE_DDR3L 0x4 > +/* > + * BIOS programs this field of REQ_DATA [5:0] in integer > + * multiple of 133330 KHz (133.33MHz) > + */ > +#define SKL_MEMORY_FREQ_MULTIPLIER 0x208D2 There's a bad tab here that should be a space. Also, it makes a looooot more sense to use the decimal value instead of 0x208D2. > +#define BXT_D_CR_DRP0_DUNIT8 0x1000 > +#define BXT_D_CR_DRP0_DUNIT9 0x1200 > +#define BXT_D_CR_DRP0_DUNIT_MAX 4 > +#define BXT_D_CR_DRP0_DUNIT(x) _MMIO_MCHBAR_PIPE(x, > BXT_D_CR_DRP0_DUNIT8, BXT_D_CR_DRP0_DUNIT9) > +#define BXT_DRAM_RANK_MASK 0x3 > +#define BXT_DRAM_RANK_SINGLE 0x1 > +#define BXT_DRAM_RANK_DUAL 0x3 > + > +#define SKL_MC_BIOS_DATA_0_0_0_MCHBAR_PCU _MMIO(MCHBAR_MIRROR > _BASE_SNB + 0x5E04) Why does BXT look at BIOS_REQ while SKL looks at BIOS_DATA? Shouldn't SKL also look at BIOS_DATA or BXT look at BIOS_REQ? What's the big difference? Also, that comment mentioning that we may sometimes get zero is a little worrying: shouldn't we make sure we always read when it's non-zero? Thanks for the patch, and sorry for the huge amount of questions here: learning while doing the review is sometimes tricky. > +#define SKL_REQ_DATA_MASK (0xF << 0) > +#define SKL_MAD_DIMM_CH0_0_0_0_MCHBAR_MCMAIN _MMIO(MCHBAR_MIR > ROR_BASE_SNB + 0x500C) > +#define SKL_MAD_DIMM_CH1_0_0_0_MCHBAR_MCMAIN _MMIO(MCHBAR_MIR > ROR_BASE_SNB + 0x5010) > +#define SKL_DRAM_CHANNEL_WIDTH_MASK 0x3 > +#define SKL_DRAM_CHANNEL_WIDTH_SHIFT 8 > +#define SKL_DRAM_WIDTH_1 0x0 > +#define SKL_DRAM_WIDTH_2 0x1 > +#define SKL_DRAM_WIDTH_4 0x2 > +#define SKL_DRAM_RANK_MASK 0x1 > +#define SKL_DRAM_RANK_SHIFT 10 > +#define SKL_DRAM_RANK_SINGLE 0x0 > +#define SKL_DRAM_RANK_DUAL 0x1 > + > /* Please see hsw_read_dcomp() and hsw_write_dcomp() before using > this register, > * since on HSW we can't write to it using I915_WRITE. */ > #define D_COMP_HSW _MMIO(MCHBAR_MIRROR_BASE_S > NB + 0x5F0C) _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx