On 17.06.2016 15:31, StDenis, Tom wrote: > I wonder if some sort of self-test like the ring/ib tests we do is a > good idea. Either from the UMD or KMD. > > > In this specific case though are you working around a CU that results in > a GPU lockup? Or does it just not respond correctly? Computations in that CU flip bits occasionally. It actually wasn't noticeable at all in regular desktop use, and I didn't see traces of it with the usual benchmarks and games either -- only in hindsight did I notice some slightly wrong pixels when zooming into screenshots of the desktop. I also hope to use this option to do more extensive stress tests of whether we can still run stably with many CUs disabled - I suspect an interaction between CU disabling and CU reservations for shader stages. I don't think an automatic self-test is feasible for the kernel module, and from user space, "stress testing" with Piglit is precisely how I found it :) Nicolai > > > Tom > > > > ------------------------------------------------------------------------ > *From:* amd-gfx <amd-gfx-bounces at lists.freedesktop.org> on behalf of > Nicolai Hähnle <nhaehnle at gmail.com> > *Sent:* Friday, June 17, 2016 09:17 > *To:* amd-gfx at lists.freedesktop.org > *Cc:* Haehnle, Nicolai > *Subject:* [amd-gfx] [PATCH 1/3] drm/amdgpu: add disable_cu parameter > From: Nicolai Hähnle <nicolai.haehnle at amd.com> > > This parameter will allow disabling individual CUs on module load, e.g. > amdgpu.disable_cu=2.0.3,2.0.4 to disable CUs 3 and 4 of SE2. > > Signed-off-by: Nicolai Hähnle <nicolai.haehnle at amd.com> > --- > drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 + > drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 4 +++ > drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 44 > +++++++++++++++++++++++++++++++++ > drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h | 2 ++ > 4 files changed, 51 insertions(+) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h > b/drivers/gpu/drm/amd/amdgpu/amdgpu.h > index 01c36b8..2d35e11 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h > @@ -87,6 +87,7 @@ extern int amdgpu_sched_hw_submission; > extern int amdgpu_powerplay; > extern unsigned amdgpu_pcie_gen_cap; > extern unsigned amdgpu_pcie_lane_cap; > +extern char *amdgpu_disable_cu; > > #define AMDGPU_WAIT_IDLE_TIMEOUT_IN_MS 3000 > #define AMDGPU_MAX_USEC_TIMEOUT 100000 /* 100 ms */ > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > index f888c01..235f732 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > @@ -84,6 +84,7 @@ int amdgpu_sched_hw_submission = 2; > int amdgpu_powerplay = -1; > unsigned amdgpu_pcie_gen_cap = 0; > unsigned amdgpu_pcie_lane_cap = 0; > +char *amdgpu_disable_cu = NULL; > > MODULE_PARM_DESC(vramlimit, "Restrict VRAM for testing, in megabytes"); > module_param_named(vramlimit, amdgpu_vram_limit, int, 0600); > @@ -168,6 +169,9 @@ module_param_named(pcie_gen_cap, > amdgpu_pcie_gen_cap, uint, 0444); > MODULE_PARM_DESC(pcie_lane_cap, "PCIE Lane Caps (0: autodetect > (default))"); > module_param_named(pcie_lane_cap, amdgpu_pcie_lane_cap, uint, 0444); > > +MODULE_PARM_DESC(disable_cu, "Disable CUs (se.sh.cu,...)"); > +module_param_named(disable_cu, amdgpu_disable_cu, charp, 0444); > + > static const struct pci_device_id pciidlist[] = { > #ifdef CONFIG_DRM_AMDGPU_CIK > /* Kaveri */ > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c > index 9f95da4..a074edd 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c > @@ -70,3 +70,47 @@ void amdgpu_gfx_scratch_free(struct amdgpu_device > *adev, uint32_t reg) > } > } > } > + > +/** > + * amdgpu_gfx_parse_disable_cu - Parse the disable_cu module parameter > + * > + * @mask: array in which the per-shader array disable masks will be stored > + * @max_se: number of SEs > + * @max_sh: number of SHs > + * > + * The bitmask of CUs to be disabled in the shader array determined by > se and > + * sh is stored in mask[se * max_sh + sh]. > + */ > +void amdgpu_gfx_parse_disable_cu(unsigned *mask, unsigned max_se, > unsigned max_sh) > +{ > + unsigned se, sh, cu; > + const char *p; > + > + memset(mask, 0, sizeof(*mask) * max_se * max_sh); > + > + if (!amdgpu_disable_cu || !*amdgpu_disable_cu) > + return; > + > + p = amdgpu_disable_cu; > + for (;;) { > + char *next; > + int ret = sscanf(p, "%u.%u.%u", &se, &sh, &cu); > + if (ret < 3) { > + DRM_ERROR("amdgpu: could not parse disable_cu\n"); > + return; > + } > + > + if (se < max_se && sh < max_sh && cu < 16) { > + DRM_INFO("amdgpu: disabling CU %u.%u.%u\n", se, > sh, cu); > + mask[se * max_sh + sh] |= 1u << cu; > + } else { > + DRM_ERROR("amdgpu: disable_cu %u.%u.%u is out of > range\n", > + se, sh, cu); > + } > + > + next = strchr(p, ','); > + if (!next) > + break; > + p = next + 1; > + } > +} > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h > b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h > index dc06cbd..51321e1 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h > @@ -27,4 +27,6 @@ > int amdgpu_gfx_scratch_get(struct amdgpu_device *adev, uint32_t *reg); > void amdgpu_gfx_scratch_free(struct amdgpu_device *adev, uint32_t reg); > > +unsigned amdgpu_gfx_parse_disable_cu(unsigned *mask, unsigned max_se, > unsigned max_sh); > + > #endif > -- > 2.7.4 > > _______________________________________________ > amd-gfx mailing list > amd-gfx at lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/amd-gfx > > > _______________________________________________ > amd-gfx mailing list > amd-gfx at lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/amd-gfx >