On Wed, Sep 23, 2020 at 11:52:51AM -0400, Steven Rostedt wrote: > On Wed, 23 Sep 2020 10:40:32 +0200 > peterz@xxxxxxxxxxxxx wrote: > > > However, with migrate_disable() we can have each task preempted in a > > migrate_disable() region, worse we can stack them all on the _same_ CPU > > (super ridiculous odds, sure). And then we end up only able to run one > > task, with the rest of the CPUs picking their nose. > > What if we just made migrate_disable() a local_lock() available for !RT? Can't, neiter migrate_disable() nor migrate_enable() are allowed to block -- which is what makes their implementation so 'interesting'. > This should lower the SHC in theory, if you can't have stacked migrate > disables on the same CPU. See this email in that other thread (you're on Cc too IIRC): https://lkml.kernel.org/r/20200923170809.GY1362448@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx I think that is we 'frob' the balance PULL, we'll end up with something similar. Whichever way around we turn this thing, the migrate_disable() runtime (we'll have to add a tracer for that), will be an interference term on the lower priority task, exactly like preempt_disable() would be. We'll just not exclude a higher priority task from running. AFAICT; the best case is a single migrate_disable() nesting, where a higher priority task preempts in a migrate_disable() section -- this is per design. When this preempted task becomes elegible to run under the ideal model (IOW it becomes one of the M highest priority tasks), it might still have to wait for the preemptee's migrate_disable() section to complete. Thereby suffering interference in the exact duration of migrate_disable() section. Per this argument, the change from preempt_disable() to migrate_disable() gets us: - higher priority tasks gain reduced wake-up latency - lower priority tasks are unchanged and are subject to the exact same interference term as if the higher priority task were using preempt_disable(). Since we've already established this term is unbounded, any task but the highest priority task is basically buggered. TL;DR, if we get balancing fixed and achieve (near) the optimal case above, migrate_disable() is an over-all win. But it's provably non-deterministic as long as the migrate_disable() sections are non-deterministic. The reason this all mostly works in practise is (I think) because: - People care most about the higher prio RT tasks and craft them to mostly avoid the migrate_disable() infected code. - The preemption scenario is most pronounced at higher utilization scenarios, and I suspect this is fairly rare to begin with. - And while many of these migrate_disable() regions are unbound in theory, in practise they're often fairly reasonable. So my current todo list is: - Change RT PULL - Change DL PULL - Add migrate_disable() tracer; exactly like preempt/irqoff, except measuring task-runtime instead of cpu-time. - Add a mode that measures actual interference. - Add a traceevent to detect preemption in migrate_disable(). And then I suppose I should twist Daniel's arm to update his model to include these scenarios and numbers.