On 11/4/2022 6:27 PM, Brian Norris wrote:
Hi,
On Fri, Nov 04, 2022 at 05:49:54PM -0700, Ceraolo Spurio, Daniele wrote:
On 11/4/2022 5:38 PM, Ceraolo Spurio, Daniele wrote:
On 11/4/2022 4:26 PM, Brian Norris wrote:
Did you track this down? Or consider reverting? This is tripping me up
No. I didn't manage to repro locally after Tvrtko reported it (I run the
full selftest suite twice on both ADL-S and DG2 with the debug config
enabled), so I was keeping an eye out as suggested to see if it popped
out again. If you can repro this consistently, can you share your setup
info? What platform you're running on, if you're using the latest
drm-tip, any non-default params you're using, etc. Dmesg would also be
useful to see if there are other errors before this one.
Just to further clarify, this issue is also not showing up in our CI runs
(which do have both the DEBUG_OBJECTS kconfigs you pointed out enabled),
hence why I'm suspecting that this is only happening on specific setups,
potentially due to a different kconfig or modparam being involved.
Huh, well join the crowd. I'm currently hunting through ways to
reproduce the CI runs, which are complaining about a different patch of
mine...
...and I can't reproduce :)
Anyway, I'm running on a GLK Chromebook. I have to do some minimal
tweaking to get the average ChromeOS setup to work (basically, neuter
the display manager and boot splash, so DRM/drivers can release
cleanly), but then the IGT tools run as normal. Attaching dmesg and
.config.
Test sequence:
igt-gpu-tools/i915_module_load --run-subtest reload ## this first one is probably unnecessary
igt-gpu-tools/gem_exec_gttfill --run-subtest basic
igt-gpu-tools/i915_module_load --run-subtest reload
I'm running drm-tip, at:
a397a9098fb3 drm-tip: 2022y-11m-04d-19h-23m-35s UTC integration manifest
I doubt too much of the ChromeOS setup itself is uniquely interesting,
other than perhaps that we run a simple 'frecon' console [1] that I had
to kill first (so, it probably touched/released some buffers).
Brian
[1] https://chromium.googlesource.com/chromiumos/platform/frecon/+/HEAD/README.md
Ok, I think I have an idea of what's happening: if HuC is not enabled,
we skip the call to fence_fini, so we leak the debug object. Can you
check if the below diff fixes the issue for you?
Thanks,
Daniele
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_huc.c
b/drivers/gpu/drm/i915/gt/uc/intel_huc.c
index fbc8bae14f76..e3bbd174889d 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_huc.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_huc.c
@@ -300,13 +300,12 @@ int intel_huc_init(struct intel_huc *huc)
void intel_huc_fini(struct intel_huc *huc)
{
- if (!intel_uc_fw_is_loadable(&huc->fw))
- return;
-
delayed_huc_load_complete(huc);
i915_sw_fence_fini(&huc->delayed_load.fence);
- intel_uc_fw_fini(&huc->fw);
+
+ if (intel_uc_fw_is_loadable(&huc->fw))
+ intel_uc_fw_fini(&huc->fw);
}