Re: [PATCH] drm/atomic: Perform blocking commits on workqueue

Michel Dänzer <michel.daenzer@xxxxxxxxxxx> · Fri, 13 Oct 2023 12:22:52 +0200

On 10/13/23 11:41, Daniel Vetter wrote:
> On Thu, Oct 12, 2023 at 02:19:41PM -0400, Ray Strode wrote:
>> On Mon, Oct 09, 2023 at 02:36:17PM +0200, Christian König wrote:
>>>>>> To be clear, my take is, if driver code is running in process context
>>>>>> and needs to wait for periods of time on the order of or in excess of
>>>>>> a typical process time slice it should be sleeping during the waiting.
>>>>>> If the operation is at a point where it can be cancelled without side
>>>>>> effects, the sleeping should be INTERRUPTIBLE. If it's past the point
>>>>>> of no return, sleeping should be UNINTERRUPTIBLE. At no point, in my
>>>>>> opinion, should kernel code busy block a typical process for dozens of
>>>>>> milliseconds while keeping the process RUNNING. I don't think this is
>>>>>> a controversial take.
>>>>> Exactly that's what I completely disagree on.
>>
>> Okay if we can't agree that it's not okay for user space (or the
>> kernel running in the context of user space) to busy loop a cpu core
>> at 100% utilization throughout and beyond the process's entire
>> scheduled time slice then we really are at an impasse. I gotta say i'm
>> astonished that this seemingly indefensible behavior is somehow a
>> point of contention, but I'm not going to keep arguing about it beyond
>> this email.
>>
>> I mean we're not talking about scientific computing, or code
>> compilation, or seti@home. We're talking about nearly the equivalent
>> of `while (1) __asm__ ("nop");`
> 
> I don't think anyone said this shouldn't be fixed or improved.
> 
> What I'm saying is that the atomic ioctl is not going to make guarantees
> that it will not take up to much cpu time (for some extremely vague value
> of "too much") to the point that userspace can configure it's compositor
> in a way that it _will_ get killed if we _ever_ violate this rule.
> 
> We should of course try to do as good as job as possible, but that's not
> what you're asking for. You're asking for a hard real-time guarantee with
> the implication if we ever break it, it's a regression, and the kernel has
> to bend over backwards with tricks like in your patch to make it work.

I don't think mutter really needs or wants such a hard real-time guarantee. What it needs is a fighting chance to react before the kernel kills its process.

The intended mechanism for this is SIGXCPU, but that can't work if the kernel is stuck in a busy-loop. Ray's patch seems like one way to avoid that.

That said, as long as SIGXCPU can work as intended with the non-blocking commits mutter uses for everything except modesets, mutter's workaround of dropping RT priority for the blocking commits seems acceptable for the time being.

-- 
Earthling Michel Dänzer            |                  https://redhat.com
Libre software enthusiast          |         Mesa and Xwayland developer