Re: [RFC 02/11] drm/i915: Introduce uevent for full GPU reset.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 16/06/2015 17:55, Chris Wilson wrote:
On Tue, Jun 16, 2015 at 04:43:55PM +0100, Tomas Elf wrote:
On 16/06/2015 14:43, Daniel Vetter wrote:
On Mon, Jun 08, 2015 at 06:03:20PM +0100, Tomas Elf wrote:
The TDR ULT used to validate this patch series requires a special uevent for
full GPU resets in order to distinguish between different kinds of resets.

Signed-off-by: Tomas Elf <tomas.elf@xxxxxxxxx>

Why duplicate the uevent we send out from i915_reset_and_wakeup? At least
I can't spot what this gets us in addition to the existing one.
-Daniel

Look at this line:
+		reset_event[0] = kasprintf(GFP_KERNEL, "%s", "GPU RESET=0");

It doesn't exist in reset_and_wakeup (specifically, the "GPU
RESET=0" part). It's a uevent that happens at the time of the actual
GPU reset (GDRST register write). In the subsequent TDR commit we
add another one to the point of the actual engine reset, which also
includes information about what exact engine was reset.

The uevents in reset_and_wakeup only tell the user that an error has
been detected and that some kind of reset has happened, these new
uevents specify exactly what kind of reset has happened. This
particular one on its own it's not very meaningful since there is
only one supported form of reset at this point but once we add
engine reset support it's useful to be able to discern the types of
resets from each other (GPU reset, RCS engine reset, VCS engine
reset, VCS2 engine reset, BCS engine reset, VECS engine reset).

Does that make sense?

The ultimate question is how do you envisage these uevents being used?

At present, we have abrtd listening out for when to grab the
/sys/drm/cardX/error and maybe for GL robustness (though I would imagine
if they thought such events useful we would have had demands for a DRM
event on the guilty/victim fd).

Does it really make sense to send uevents for both hang, partial-reset,
and full-reset?
-Chris


The reason we have such a detailed set of uevents is primarily for testing purposes. Our internal VPG tests check for these uevents to make sure that the expected recovery mode is actually being used. Which makes sense, because the TDR driver code contains reset promotion logic to decide what recovery mode to use and if that logic somehow gets broken the driver might go with the wrong recovery mode. Thus, it's worth testing and therefore those uevents need to be there. Of course, I guess the argument "our internal VPG tests do this" might not hold water since the tests haven't been upstreamed? If that's the case then I guess I don't have any opinion about what uevent goes where and we could go with whatever set of uevents you prefer.

Also, it might not be worth the hassle to have the reset_done_event at recovery completion at the end of i915_reset_and_wakeup() / i915_error_work_func() _as_well_as_ the respective uevent after each actual GPU/engine reset completion since reset_done_event doesn't really offer that much information that you didn't already know from the post-reset uevent. So I would be ok with removing reset_done_event.

Thanks,
Tomas

_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
http://lists.freedesktop.org/mailman/listinfo/intel-gfx




[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux