On 01/07/2019 10:50 AM, Chris Wilson wrote:
Quoting Daniele Ceraolo Spurio (2019-01-07 18:31:52)
On 01/02/2019 01:41 AM, Chris Wilson wrote:
The guc (and huc) currently inexcruitably depend on struct_mutex for
device reinitialisation from inside the reset, and indeed taking any
mutex here is verboten (as we must be able to reset from underneath any
of our mutexes). That makes recovering the guc unviable without, for
example, reserving contiguous vma space and pages for it to use.
Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
---
drivers/gpu/drm/i915/i915_reset.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/gpu/drm/i915/i915_reset.c b/drivers/gpu/drm/i915/i915_reset.c
index f5da67f1bc04..77fc2f74e427 100644
--- a/drivers/gpu/drm/i915/i915_reset.c
+++ b/drivers/gpu/drm/i915/i915_reset.c
@@ -590,6 +590,9 @@ int intel_gpu_reset(struct drm_i915_private *i915, unsigned int engine_mask)
bool intel_has_gpu_reset(struct drm_i915_private *i915)
{
+ if (USES_GUC(i915))
+ return false;
+
Do we need to tweak the getparam so we can report that we have engine
reset but not full reset?
Oh, we cut a corner. But is userspace ready for having one but not the
other... What userspace other than igt, I wonder. I suspect even igt
treats it as boolean for the large part, with just a few correctly
checking when they want per-engine resets.
Also, this is a regression of capabilities when GuC is enabled, so we
need to make lockless reset work with GuC soon.
Hah. It frequently fails to recover after reset :-p
Were you already
planning/working on something along the lines of the possible solution
mentioned in the commit message? Just trying to understand what the
status is before jumping in to avoid duplication of work ;)
The compromises required (reserving space for the firmware) seem like it
will just end up with a lot of screaming to make it mutexless today.
Aren't we already keeping some GuC stuff perma-pinned (e.g. the
stage_desc_pool)? Current Gen11+ firmware needs more space and uses
objects up to tens of MBs, the blobs aren't that big in comparison.
I was planning on coming back to it after breaking up the struct_mutex
to see which locks were required. If we get to the point where we can
guarantee that those locks are never held when waiting (inc.
allocations) (the plan there is to annotate them with lockdep to catch
waiters), then we can safely use those inside the reset.
Sounds good.
Or maybe we can get away with using just a single page in the global gtt
and just loading the firmware piecemeal, or something.
From my understanding the HW consider the firmware loaded and starts
processing it as soon as the first DMA transfer completes, so loading it
bit by bit wouldn't work.
Thanks,
Daniele
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx