Re: [patch V4 4/8] sched: Make migrate_disable/enable() independent of RT

Peter Zijlstra <peterz@xxxxxxxxxxxxx> · Fri, 20 Nov 2020 10:29:28 +0100

On Fri, Nov 20, 2020 at 02:33:58AM +0100, Thomas Gleixner wrote:
> On Thu, Nov 19 2020 at 19:28, Peter Zijlstra wrote:
> > On Thu, Nov 19, 2020 at 09:23:47AM -0800, Linus Torvalds wrote:
> >> Because this is certainly not the only time migration limiting has
> >> come up, and no, it has absolutely nothing to do with per-cpu page
> >> tables being completely unacceptable.
> >
> > It is for this instance; but sure, it's come up before in other
> > contexts.
> 
> Indeed. And one of the really bad outcomes of this is that people are
> forced to use preempt_disable() to prevent migration which entails a
> slew of consequences:
> 
>      - Using spinlocks where it wouldn't be needed otherwise
>      - Spinwaiting instead of sleeping
>      - The whole crazyness of doing copy_to/from_user_in_atomic() along
>        with the necessary out of line error handling.
>      - ....
> 
> The introduction of per-cpu storage happened almost 20 years ago (2002)
> and still the only answer we have is preempt_disable().

IIRC the first time this migrate_disable() stuff came up was when Chris
Lameter did SLUB. Eventually he settled for that cmpxchg_double()
approach (which is somewhat similar to userspace rseq) which is vastly
superiour and wouldn't have happened had we provided migrate_disable().

As already stated, per-cpu page-tables would allow for a much saner kmap
approach, but alas, x86 really can't sanely do that (the archs that have
separate kernel and user page-tables could do this, and how we cursed
x86 didn't have that when meltdown happened).

[ and using fixmaps in the per-cpu memory space _could_ work, but is a
  giant pain because then all accesses need GS prefix and blah... ]

And I'm sure there's creative ways for other problems too, but yes, it's
hard.

Anyway, clearly I'm the only one that cares, so I'll just crawl back
under my rock...