Re: [RFC PATCH 47/86] rcu: select PREEMPT_RCU if PREEMPT

Thomas Gleixner <tglx@xxxxxxxxxxxxx> · Tue, 28 Nov 2023 11:53:19 +0100

Paul!

On Tue, Nov 21 2023 at 07:19, Paul E. McKenney wrote:
> On Tue, Nov 21, 2023 at 10:00:59AM -0500, Steven Rostedt wrote:
>> Right now, the use of cond_resched() is basically a whack-a-mole game where
>> we need to whack all the mole loops with the cond_resched() hammer. As
>> Thomas said, this is backwards. It makes more sense to just not preempt in
>> areas that can cause pain (like holding a mutex or in an RCU critical
>> section), but still have the general kernel be fully preemptable.
>
> Which is quite true, but that whack-a-mole game can be ended without
> getting rid of build-time selection of the preemption model.  Also,
> that whack-a-mole game can be ended without eliminating all calls to
> cond_resched().

Which calls to cond_resched() should not be eliminated?

They all suck and keeping some of them is just counterproductive as
again people will sprinkle them all over the place for the very wrong
reasons.

> Additionally, if the end goal is to be fully preemptible as in eventually
> eliminating lazy preemption, you have a lot more convincing to do.

That's absolutely not the case. Even RT uses the lazy mode to prevent
overeager preemption for non RT tasks.

The whole point of the exercise is to keep the kernel always fully
preemptible, but only enforce the immediate preemption at the next
possible preemption point when necessary.

The decision when it is necessary is made by the scheduler and not
delegated to the whim of cond/might_resched() placement.

That is serving both worlds best IMO:

  1) LAZY preemption prevents the negative side effects of overeager
     preemption, aka. lock contention and pointless context switching.

     The whole thing behaves like a NONE kernel unless there are
     real-time tasks or a task did not comply to the lazy request within
     a given time.

  2) It does not prevent the scheduler from making decisions to preempt
     at the next possible preemption point in order to get some
     important computation on the CPU.

     A NONE kernel sucks vs. any sporadic [real-time] task. Just run
     NONE and watch the latencies. The latencies are determined by the
     interrupted context, the placement of the cond_resched() call and
     the length of the loop which is running.

     People have complained about that and the only way out for them is
     to switch to VOLUNTARY or FULL preemption and thereby paying the
     price for overeager preemption.

     A price which you don't want to pay for good reasons but at the
     same time you care about latencies in some aspects and the only
     answer you have for that is cond_resched() or similar which is not
     an answer at all.

  3) Looking at the initial problem Ankur was trying to solve there is
     absolutely no acceptable solution to solve that unless you think
     that the semantically invers 'allow_preempt()/disallow_preempt()'
     is anywhere near acceptable.

Thanks,

        tglx