Hi Steve, On Thu, 16 May 2019 17:21:39 +0100 Steven Price <steven.price@xxxxxxx> wrote: > >> +static void panfrost_perfcnt_setup(struct panfrost_device *pfdev) > >> +{ > >> + u32 cfg; > >> + > >> + /* > >> + * Always use address space 0 for now. > >> + * FIXME: this needs to be updated when we start using different > >> + * address space. > >> + */ > >> + cfg = GPU_PERFCNT_CFG_AS(0); > >> + if (panfrost_model_cmp(pfdev, 0x1000) >= 0) > > > > You've got a couple of these. Perhaps we should add either a > > model_is_bifrost() helper or an is_bifrost variable to use. > > > >> + cfg |= GPU_PERFCNT_CFG_SETSEL(1); > > mali_kbase only sets this if CONFIG_MALI_PRFCNT_SET_SECONDARY is defined > - i.e. this selects a different selection of counters. At at the very > least this should be an optional flag that can be set, but I suspect the > primary set are the ones you are interested in. I'll add a param to pass the set of counters to monitor. > > >> + > >> + gpu_write(pfdev, GPU_PERFCNT_CFG, > >> + cfg | GPU_PERFCNT_CFG_MODE(GPU_PERFCNT_CFG_MODE_OFF)); > >> + > >> + if (!pfdev->perfcnt->enabled) > >> + return; > >> + > >> + gpu_write(pfdev, GPU_PRFCNT_JM_EN, 0xffffffff); > >> + gpu_write(pfdev, GPU_PRFCNT_SHADER_EN, 0xffffffff); > >> + gpu_write(pfdev, GPU_PRFCNT_MMU_L2_EN, 0xffffffff); > >> + > >> + /* > >> + * Due to PRLAM-8186 we need to disable the Tiler before we enable HW > >> + * counters. > > > > Seems like you could just always apply the work-around? It doesn't > > look like it would be much different. > > Technically the workaround causes the driver to perform the operation in > the wrong order below (write TILER_EN when the mode is MANUAL) - this is > a workaround for the hardware issue (and therefore works on that > hardware). But I wouldn't like to say it works on other hardware which > doesn't have the issue. Okay, I'll keep the workaround only for HW that are impacted by the bug. > > >> + */ > >> + if (panfrost_has_hw_issue(pfdev, HW_ISSUE_8186)) > >> + gpu_write(pfdev, GPU_PRFCNT_TILER_EN, 0); > >> + else > >> + gpu_write(pfdev, GPU_PRFCNT_TILER_EN, 0xffffffff); > >> + > >> + gpu_write(pfdev, GPU_PERFCNT_CFG, > >> + cfg | GPU_PERFCNT_CFG_MODE(GPU_PERFCNT_CFG_MODE_MANUAL)); > >> + > >> + if (panfrost_has_hw_issue(pfdev, HW_ISSUE_8186)) > >> + gpu_write(pfdev, GPU_PRFCNT_TILER_EN, 0xffffffff); > >> +} [...] > >> +int panfrost_perfcnt_init(struct panfrost_device *pfdev) > >> +{ > >> + struct panfrost_perfcnt *perfcnt; > >> + struct drm_gem_shmem_object *bo; > >> + size_t size; > >> + u32 status; > >> + int ret; > >> + > >> + if (panfrost_has_hw_feature(pfdev, HW_FEATURE_V4)) { > >> + unsigned int ncoregroups; > >> + > >> + ncoregroups = hweight64(pfdev->features.l2_present); > >> + size = ncoregroups * BLOCKS_PER_COREGROUP * > >> + COUNTERS_PER_BLOCK * BYTES_PER_COUNTER; > >> + } else { > >> + unsigned int nl2c, ncores; > >> + > >> + /* > >> + * TODO: define a macro to extract the number of l2 caches from > >> + * mem_features. > >> + */ > >> + nl2c = ((pfdev->features.mem_features >> 8) & GENMASK(3, 0)) + 1; > >> + > >> + /* > >> + * The ARM driver is grouping cores per core group and then > >> + * only using the number of cores in group 0 to calculate the > >> + * size. Not sure why this is done like that, but I guess > >> + * shader_present will only show cores in the first group > >> + * anyway. > >> + */ > > I'm not sure I understand this comment. I'm not sure which version of > the driver you are looking at, but shader_present should contain all the > cores (in every core group). > > The kbase driver (in panfrost's git tree[1]), contains: > > base_gpu_props *props = &kbdev->gpu_props.props; > u32 nr_l2 = props->l2_props.num_l2_slices; > u64 core_mask = props->coherency_info.group[0].core_mask; > > u32 nr_blocks = fls64(core_mask); > > /* JM and tiler counter blocks are always present */ > dump_size = (2 + nr_l2 + nr_blocks) * > NR_CNT_PER_BLOCK * > NR_BYTES_PER_CNT; > > [1] > https://gitlab.freedesktop.org/panfrost/mali_kbase/blob/master/driver/product/kernel/drivers/gpu/arm/midgard/mali_kbase_vinstr.c > > i.e. the number of cores + number of L2 caches (plus 2 for JM/tiler) > multiplied by the block size. Actually, I was referring to what's done in kbase_gpuprops_construct_coherent_groups() [2]. > > Also another difference with the below is that fls64 and hweight64 don't > return the same numbers. If the core mask is non-contiguous this will > return different values. Right, I should use fls64() not hweight64(). Regards, Boris [2]https://gitlab.freedesktop.org/panfrost/mali_kbase/blob/master/driver/product/kernel/drivers/gpu/arm/midgard/mali_kbase_gpuprops.c#L71 _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/dri-devel