Re: [PATCH] drm/atomic: Perform blocking commits on workqueue

Christian König <christian.koenig@xxxxxxx> · Thu, 28 Sep 2023 15:23:54 +0200

    Hi,

    Am 28.09.23 um 14:46 schrieb Ray
      Strode:

      Hi,

On Thu, Sep 28, 2023 at 2:56 AM Christian König
<christian.koenig@xxxxxxx> wrote:

          To say the "whole point" is about CPU overhead accounting sounds
rather absurd to me. Is that really what you meant?

        Yes, absolutely. See the functionality you try to implement already exists.

      You say lower in this same message that you don't believe the
functionality actually works for the dpms off case I mentioned.

    well in the dpms off case there shouldn't be any blocking as far
      as I understand it.

      If you see a large delay in the dpms off case then we probably
      have a driver bug somewhere.

    [SNIP]

        A blocking system call in the sense of RLIMIT_RTTIME means something
which results in the process listening for external events, e.g. calling
select(), epoll() or read() on (for example) a network socket etc...

As far as I can see drmAtomicCommit() is *not* meant with that what
similar to for example yield() also doesn't reset the RLIMIT_RTTIME counter.

      No no no, drmModeAtomicCommit() is not like sched_yield(). That's a
really strange thing to say (you do mean sched_yield() right?).
sched_yield() is an oddball because it's specifically for giving other
threads a turn if they need it without causing the current thread to
sleep if they don't.  It's a niche api that's meant for high performance
use cases. It's a way to reduce scheduling latency and increase running
time predictability.  drmModeAtomicCommit() using up rt time, busy
looping while waiting on the hardware to respond, eating into userspace
RLIMIT_RTTIME is nothing like that.

I'm getting the idea that you think there is some big bucket of kernel
syscalls that block for a large fraction of a second by design and are
not meant to reset RLIMIT_RTTIME.

    Yes, exactly that's the case as far as I know.

    The purpose of RLIMIT_RTTIME is to have a watchdog if an application
    is still responsive. Only things which make the application look for
    outside events are considered a reset for this whatdog.

    So for example calling select() and poll() will reset the watchdog,
    but not calling any DRM modeset functions or an power management
    IOCTL for example.

    There are only two possibilities I can see how a DRM modeset is
    triggering this:

    1. We create enough CPU overhead to actually exceed the limit. In
    this case triggering it is certainly intentional.

    2. We busy wait for the hardware to reach a certain state. If that
    is true then this is a bug in the driver.

       I could be wrong, but I don't think
that's true. Off the top of my head, the only ones I can think of that
might reasonably do that are futex() (which obviously can't sleep),
sched_yield() (who's whole point is to not sleep), and maybe a
some obscure ioctls (some probably legitimately, some probably
illegitimately and noone has noticed.). I'm willing to be proven wrong
here, and I might be, but right now from thinking about it, my guess is
the above list is pretty close to complete.

        Well you are breaking the RLIMIT_RTTIME functionality.

The purpose of that functionality is to allow debugging and monitoring
applications to make sure that they keep alive and can react to external
signals.

      I don't think you really thought through what you're saying here. It
just flatly doesn't apply for drmModeAtomicCommit. What is an
application supposed to do?

    Get terminated and restart. That's what the whole functionality is
    good for.

    If you don't desire that than don't use the RLIMIT_RTTIME
    functionality.

       It can't block the SIGKILL that's coming.
Respond to the preceding SIGXCPUs? What response could the application
possibly make? I'm guessing drmModeAtomicCommit isn't going to EINTR
because it's busy looping waiting on hardware in the process context.
And the kernel doesn't even guarantee SIGXCPU is going to go to the
thread with the stuck syscall, so even if there was a legitimate
response (or even "pthread_cancel" or some wreckless nonsense like
that), getting the notification to the right part of the program is an
exercise in gymnastics.

        From the RLIMIT_RTTIME documentation: "The intended use of this limit
is to stop a runaway real-time process from locking up the system."

And when drmAtomicCommit() is triggering this then we either have a
problem with the application doing something it is not supposed to do
(like blocking for vblank while it should listen to network traffic) or
the driver is somehow buggy.

      drmModeAtomicCommit() is used by display servers. If drmModeAtomicCommit
runs away in e.g. a set of 100ms busy loops responding to a confused or
slow responding GPU, the system will seemingly lock up to the user. That
is an intractable problem that we can not get away from. It doesn't
matter if the kernel is stuck in process context or on a workqueue. And,
regardless, it's not reasonable to expect userspace to craft elaborate
workarounds for driver bugs. We just have to fix the bugs.

    Exactly that's the reason why I'm pointing out that this isn't a
    good idea.

        No when you disable everything there is of course no fence allocated.

      Okay, so it's not actually an answer

        But then you also should never see anything waiting for to long to
actually be able to trigger the RLIMIT_RTTIME.

      But we do. That's the problem. That's like the whole problem. The amdgpu
driver thinks it's okay to do something like:

for_each_command_in_queue(&queue, command) {
        execute_command(command);
        while (1) {
                wait_for_response();

                if (delay++ > 100000)
                        break;
                udelay(1);
        }
}

or something like that. all while keeping the process in the RUNNABLE
state. It just seems wrong to me. At least let the userspace process
get scheduled out.

    That just papers over the problem. The process doesn't become more
    responsive by hiding the problem.

    What you need to do here is to report those problems to the driver
    teams and not try to hide them this way.

    Regards,

    Christian.

          Regardless, this seems like a roundabout way to address a problem that
we could just ... fix.

        Well to make it clear: This is not a problem but intended behavior!

      I'm going to be frank, I don't think this was intended behavior. We can
wait for sima to get off PTO and chime in, to know one way or the other
(or maybe airlied can chime in with his take?), but I doubt he was
thinking about realtime scheduling minutiae when he put together the
drmModeAtomicCommit implementation.

There's no practical reason for doing things the way they're being done
as far as I can tell.

--Ray