On 2024-01-15 17:46, Joshua Ashton wrote: > On 1/15/24 16:34, Michel Dänzer wrote: >> On 2024-01-15 17:19, Friedrich Vock wrote: >>> On 15.01.24 16:43, Joshua Ashton wrote: >>>> On 1/15/24 15:25, Michel Dänzer wrote: >>>>> On 2024-01-15 14:17, Christian König wrote: >>>>>> Am 15.01.24 um 12:37 schrieb Joshua Ashton: >>>>>>> On 1/15/24 09:40, Christian König wrote: >>>>>>>> Am 13.01.24 um 15:02 schrieb Joshua Ashton: >>>>>>>> >>>>>>>>> Without this feedback, the application may keep pushing through >>>>>>>>> the soft >>>>>>>>> recoveries, continually hanging the system with jobs that timeout. >>>>>>>> >>>>>>>> Well, that is intentional behavior. Marek is voting for making >>>>>>>> soft recovered errors fatal as well while Michel is voting for >>>>>>>> better ignoring them. >>>>>>>> >>>>>>>> I'm not really sure what to do. If you guys think that soft >>>>>>>> recovered hangs should be fatal as well then we can certainly do >>>>>>>> this. >>>>> >>>>> A possible compromise might be making soft resets fatal if they >>>>> happen repeatedly (within a certain period of time?). >>>> >>>> No, no and no. Aside from introducing issues by side effects not >>>> surfacing and all of the stuff I mentioned about descriptor buffers, >>>> bda, draw indirect and stuff just resulting in more faults and hangs... >>>> >>>> You are proposing we throw out every promise we made to an application >>>> on the API contract level because it "might work". That's just wrong! >>>> >>>> Let me put this in explicit terms: What you are proposing is in direct >>>> violation of the GL and Vulkan specification. >>>> >>>> You can't just chose to break these contracts because you think it >>>> 'might' be a better user experience. >>> >>> Is the original issue that motivated soft resets to be non-fatal even an >>> issue anymore? >>> >>> If I read that old thread correctly, the rationale for that was that >>> assigning guilt to a context was more broken than not doing it, because >>> the compositor/Xwayland process would also crash despite being unrelated >>> to the hang. >>> With Joshua's Mesa fixes, this is not the case anymore, so I don't think >>> keeping soft resets non-fatal provides any benefit to the user experience. >>> The potential detriments to user experience have been outlined multiple >>> times in this thread already. >>> >>> (I suppose if the compositor itself faults it might still bring down a >>> session, but I've literally never seen that, and it's not like a >>> compositor triggering segfaults on CPU stays alive either.) >> >> That's indeed what happened for me, multiple times. And each time the session continued running fine for days after the soft reset. >> >> But apparently my experience isn't valid somehow, and I should have been forced to log in again to please the GL gods... >> >> >> Conversely, I can't remember hitting a case where an app kept running into soft resets. It's almost as if different people may have different experiences! ;) > > Your anecdote of whatever application coincidentally managing to soldier through being hung is really not relevant. But yours is, got it. > It looks like Mutter already has some handling for GL robustness anyway... That's just for suspend/resume with the nvidia driver. It can't recover from GPU hangs yet. -- Earthling Michel Dänzer | https://redhat.com Libre software enthusiast | Mesa and Xwayland developer