Quoting Jason Ekstrand (2019-10-25 19:22:04) > On Thu, Oct 24, 2019 at 6:40 AM Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> wrote: > > Our existing behaviour is to allow contexts and their GPU requests to > persist past the point of closure until the requests are complete. This > allows clients to operate in a 'fire-and-forget' manner where they can > setup a rendering pipeline and hand it over to the display server and > immediately exiting. As the rendering pipeline is kept alive until > completion, the display server (or other consumer) can use the results > in the future and present them to the user. > > However, not all clients want this persistent behaviour and would prefer > that the contexts are cleaned up immediately upon closure. This ensures > that when clients are run without hangchecking, any GPU hang is > terminated with the process and does not continue to hog resources. > > By defining a context property to allow clients to control persistence > explicitly, we can remove the blanket advice to disable hangchecking > that seems to be far too prevalent. > > > Just to be clear, when you say "disable hangchecking" do you mean disabling it > for all processes via a kernel parameter at boot time or a sysfs entry or > similar? Or is there some mechanism whereby a context can request no hang > checking? They are being told to use the module parameter i915.enable_hangcheck=0 to globally disable hangchecking. This is what we are trying to wean them off, and yet still allow indefinitely long kernels. The softer hangcheck is focused on if you block scheduling or preemption of higher priority work, then you are forcibly removed from the GPU. However, even that is too much for some workloads, where they really do expect to permanently hog the GPU. (All I can say is that they better be dedicated systems because if you demand interactivity on top of disabling preemption...) > The default behaviour for new controls is the legacy persistence mode. > New clients will have to opt out for immediate cleanup on context > closure. If the hangchecking modparam is disabled, so is persistent > context support -- all contexts will be terminated on closure. > > > What happens to fences when the context is cancelled? Is it the same behavior > as we have today for when a GPU hang is detected and a context is banned? Yes. The incomplete fence statuses are set to -EIO -- it is the very same mechanism used to remove this context's future work from the GPU as is used for banning. -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx