On Mon, Dec 12, 2022 at 2:14 AM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote: > > On Tue, Nov 29, 2022 at 10:22:42PM -1000, Tejun Heo wrote: > > > Rolling out kernel upgrades is a slow and iterative process. At a large scale > > it can take months to roll a new kernel out to a fleet of servers. While this > > latency is expected and inevitable for normal kernel upgrades, it can become > > highly problematic when kernel changes are required to fix bugs. Livepatch [9] > > is available to quickly roll out critical security fixes to large fleets, but > > the scope of changes that can be applied with livepatching is fairly limited, > > and would likely not be usable for patching scheduling policies. With > > sched_ext, new scheduling policies can be rapidly rolled out to production > > environments. > > I don't think we can or should use this argument to push BPF into ever > more places. Improving scheduling performance requires rapid iteration to explore new policies and tune parameters, especially as hardware becomes more heterogeneous, and applications become more complex. Waiting months between evaluating scheduler policy changes is simply not scalable, but this is the reality with large fleets that require time for testing, qualification, and progressive rollout. The security angle should be clear from how involved it was to integrate core scheduling, for example.