On Mon, Jan 20, 2025 at 10:46:47AM -0500, Connor Abbott wrote: > To work around these problem, disable stall-on-fault as soon as we get a > page fault until a cooldown period after pagefaults stop. This allows > the GMU some guaranteed time to continue working. We also keep it > disabled so long as the current devcoredump hasn't been deleted, because > in that case we likely won't capture another one if there's a fault. I don't have any particular interest here, but I'm surprised to read this paragraph, maybe you could explain this some more in the commit message? I would think terminating transactions and returning a failure to the GPU would be fatal to the GPU operating model when the entire point of stall and fault handling is to make OS paging transparent to the GPU?? What happens on the GPU side when it gets this spurious failure? Jason