Quoting Chris Wilson (2019-08-06 14:47:25) > @@ -433,6 +482,8 @@ __create_context(struct drm_i915_private *i915) > > i915_gem_context_set_bannable(ctx); > i915_gem_context_set_recoverable(ctx); > + if (i915_modparams.enable_hangcheck) > + i915_gem_context_set_persistence(ctx); I am not fond of this, but from a pragmatic point of view, this does prevent the issue Jon raised: HW resources being pinned indefinitely past process termination. I don't like it because we cannot perform the operation cleanly everywhere, it requires preemption first and foremost (with a cooperating submission backend) and per-engine reset. The alternative is to try and do a full GPU reset if the context is still active. For the sake of the issue raised, I think that (full reset on older HW) is required. That we are baking in a change of ABI due to an unsafe modparam is meh. There are a few more corner cases to deal with before endless just works. But since it is being used in the wild, I'm not sure we can wait for userspace to opt-in, or wait for cgroups. However, since users are being encouraged to disable hangcheck, should we extend the concept of persistence to also mean "survives hangcheck"? -- though it should be a separate parameter, and I'm not sure how exactly to protect it from the heartbeat reset without giving gross privileges to the context. (CPU isolation is nicer from the pov where we can just give up and not even worry about the engine. If userspace can request isolation, it has the privilege to screw up.) -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx