On Tue, Sep 19, 2023 at 07:56:01AM -1000, Tejun Heo wrote: > Hello, Mel. > > I don't think the discussion has reached a point where the points of > disagreements are sufficiently laid out from both sides. Do you have any > further thoughts? > Plenty, but I'm not sure how to reconcile this. I view pluggable scheduler as something that would be a future maintenance nightmare and our "lived experience" or "exposure bias" with respect to the expertise of users differs drastically. Some developers will be mostly dealing with users that have extensive relevant expertise, a strong incentive to maximise performance and full control of their stack, others do not and the time cost of supporting such users is high. While I can see advantages to having specific schedulers targeting either a specific workload or hardware configuration, the proliferation of such schedulers and the inevitable need to avoid introducing any new regressions in deployed schedulers will be cumbersome. I generally worry that certain things may not have existed in the shipped scheduler if plugging was an option including EAS, throttling control, schedutil integration, big.Little, adapting to chiplets and picking preferred SMT siblings for turbo boost. In each case, integrating support was time consuming painful and a pluggable scheduler would have been a relatively easy out that would ultimately cost us if it was never properly integrated. While no one wants the pain, a few of us also want to avoid the problem of vendors publishing a hacky scheduler for their specific hardware and discontinuing the work at that point. I see that some friction with the current state is due to tuning knobs moving to debugfs. FWIW, I didn't 100% agree with that move either and carried an out-of-tree revert that displayed warnings for a time but I understood the logic behind it. However, if the tuning parameters are insufficient, and there is good reason to change them then the answer is to add tuning knobs with defined semantics and document them -- not pluggable schedulers. We've seen something along those lines recently with nice_latency even if it turned into EEVDF instead of a new interface, so I guess we'll see how that pans out. I get most of your points. Maybe most users will not care about a pluggable scheduler but *some will* and they will the maintenance burden. I get your point as well that if there is a bug and the pluggable scheduler then the first step would be "reproduce without the pluggable scheduler" and again, you'd be right, that is a great first step *except* sometimes they can't or sometimes they simply won't without significant proof and that's incurs a maintenance burden. Even if the pluggable schedulers are GPL, there still is a burden to understood any scheduler that is loaded to see if it's the source of a problem which means. Instead of understanding a defined number of schedulers that are developed over time with the history in changelogs, we may have to understand N schedulers that may be popular and that also is painful. That's leaving aside the difficulty of what happens when more than 1 can be loaded and interacting once containers are involved assuming that such support would exist in the future. It's already known that interacting IO schedulers are a nightmare so presumably interacting CPU schedulers within the same host would also be zero fun. Pluggable schedulers are effectively a change that we cannot walk back from if it turns out to be a bad idea because it potentially comes under the "you cannot break userspace" rule if a particular pluggable scheduler becomes popular. As I strongly believe it will be a nightmare to support within distributions where there is almost no control over the software stack of managing user expectations, I'm opposed to crossing that line with pluggable schedulers. While my nightmare scenarios may never be realised and could be overblown, it'll be hard to convince me it'll not kick me in the face eventually. -- Mel Gorman SUSE Labs