Re: Arches that don't support PREEMPT

Ingo Molnar <mingo@xxxxxxxxxx> · Wed, 20 Sep 2023 09:29:21 +0200

* Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:

On Tue, Sep 19 2023 at 10:25, Linus Torvalds wrote:
On Tue, 19 Sept 2023 at 06:48, John Paul Adrian Glaubitz
<glaubitz@xxxxxxxxxxxxxxxxxxx> wrote:

As Geert poined out, I'm not seeing anything particular problematic with the
architectures lacking CONFIG_PREEMPT at the moment. This seems to be more
something about organizing KConfig files.

It can definitely be problematic.

Not the Kconfig file part, and not the preempt count part itself.

But the fact that it has never been used and tested means that there
might be tons of "this architecture code knows it's not preemptible,
because this architecture doesn't support preemption".

So you may have basic architecture code that simply doesn't have the
"preempt_disable()/enable()" pairs that it needs.

PeterZ mentioned the generic entry code, which does this for the entry
path. But it actually goes much deeper: just do a

    git grep preempt_disable arch/x86/kernel

and then do the same for some other architectures.

Looking at alpha, for example, there *are* hits for it, so at least
some of the code there clearly *tries* to do it. But does it cover all
the required parts? If it's never been tested, I'd be surprised if
it's all just ready to go.

I do think we'd need to basically continue to support ARCH_NO_PREEMPT
- and such architectures migth end up with the worst-cast latencies of
only scheduling at return to user space.

The only thing these architectures should gain is the preempt counter 
itself, [...]

And if any of these machines are still used, there's the small benefit of 
preempt_count increasing debuggability of scheduling in supposedly 
preempt-off sections that were ignored silently previously, as most of 
these architectures do not even enable CONFIG_DEBUG_ATOMIC_SLEEP=y in their 
defconfigs:

  $ for ARCH in alpha hexagon m68k um; do git grep DEBUG_ATOMIC_SLEEP arch/$ARCH; done
  $

Plus the efficiency of CONFIG_DEBUG_ATOMIC_SLEEP=y is much reduced on 
non-PREEMPT kernels to begin with: it will basically only detect scheduling 
in hardirqs-off critical sections.

So IMHO there's a distinct debuggability & robustness plus in enabling the 
preemption count on all architectures, even if they don't or cannot use the 
rescheduling points.

[...] but yes the extra preemption points are not mandatory to have, i.e. 
we simply do not enable them for the nostalgia club.

The removal of cond_resched() might cause latencies, but then I doubt 
that these museus pieces are used for real work :)

I'm not sure we should initially remove *explicit* legacy cond_resched() 
points, except from high-freq paths where they hurt - and of course remove 
them from might_sleep().

Thanks,

	Ingo