On Fri, 21 Jun 2024 at 02:35, Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote: > > > > But secondly, the "keep things out" is itself counter-productive. > > Says the one who kept asking me repeatedly whether I can't keep the > remaining stuff of RT out of tree forever. The last time you asked that > was not that long ago. Thomas, you are starting to just make made-up arguments now. Did I require that the RT patches be done right over two decades? Hell yes. But those RT patches didn't change some single subsystem, they made fundamental changes to the most core things out there. The RT patches made something as fundamental and core as "disable interrupts for spinlocks" be a special magical thing that normal people weren't allowed to do, and that turned into a sleeping lock. They affected _everything_. They very literally affected subsystems and rules that had been there since linux-0.01 (not the spinlocks - they came later - but things like tty / printk were in fact some of the very first things written). And don't get me wrong - I'm not complaining about the RT patches. I think they improved things enormously in the end. They've been great. I'm just saying that they are _not_ the norm to compare against. The sched_ext patches? Not even remotely in the same class. The sched_ext patches are more like fuse - another filesystem, just a slightly odd one that has that big user space component. And yes, fuse caused worries too because of the whole "filesystem development in user space" outside the confines of the normal kernel development model. It mostly just was (and is) a shim-layer that asks user space for policy, while still using the kernel infrastructure for most real work. Did those worries actually materialize? No. No they did not. Now, fuse to some degree was easier, because while it was merged about two decades ago, even by that time we already had over a decade of pluggable filesystems. So the VFS layer had a lot of pluggability already, and it wasn't very hardcoded. The scheduler also has the scheduler class pluggability, but it *is* fairly hardcoded. So in addition to the kinds of issues FUSE had (with all the "impedance matching" between the kernel interface and the user level interfaces), the sched_ext patches have some of that hardcoded pluggability that it needs to fix up. So maybe a better comparison would be autofs - but autofs was added _so_ long ago that I only vaguely remember the odd pain points for waiting for mount points. Over the years, all of that special sauce that no other filesystem needs has become such an integral part of the vfs layer that people don't even think about it any more. You can most definitely still see the effects of autofs - as opposed to "regular" filesystems - on the vfs layer if you go look (things like "finish_automount()" and friends), but it's been integrated for so long that it's a non-issue. But back in the day, it all needed what at the time was some special glue. Anyway, what I'm saying is that you trying to equate this with the RT patches is absolutely laughable and intellectually dishonest. Look, ignoring the actual sched_ext code itself (and the examples, documentation and test-cases which I certainly hope nobody would object to), the actual core footprint ot fht sched_ext patches is this: MAINTAINERS | 13 +++ Makefile | 8 +- drivers/tty/sysrq.c | 1 + include/asm-generic/vmlinux.lds.h | 1 + include/linux/cgroup.h | 4 +- include/linux/sched.h | 5 + include/linux/sched/task.h | 3 +- include/uapi/linux/sched.h | 1 + init/init_task.c | 12 +++ kernel/Kconfig.preempt | 24 +++++ kernel/fork.c | 17 +++- kernel/sched/build_policy.c | 10 ++ kernel/sched/core.c | 187 ++++++++++++++++++++++++++++++-------- kernel/sched/debug.c | 3 + kernel/sched/ext.h | 114 +++++++++++++++++++++++ kernel/sched/fair.c | 21 ++--- kernel/sched/idle.c | 2 + kernel/sched/sched.h | 75 ++++++++++++++- kernel/sched/syscalls.c | 26 ++++++ lib/dump_stack.c | 1 + 20 files changed, 464 insertions(+), 64 deletions(-) and honestly, I went through it all. None of it looks really objectionable. Some of it is trivial cleanups and makes the code look better (the "refactor" parts). And yes, some of it is "We have a new scheduler class that wants more". Realistically, of the above, I think the *only* parts that are even at all halfway interesting are these: kernel/sched/core.c | 187 ++++++++++++++++++++++++++++++-------- kernel/sched/ext.h | 114 +++++++++++++++++++++++ and that ext.h is on that "interesting" list only because of for_each_active_class(), and I think it should probably just be in core.c. See why I do not think that this is AT ALL comparable to something like the RT patches. And yes, I do think code should be cleaned up, but the two cleanups that struck me personally were literally just (a) scx_next_task_picked, where I slept on it, came up with a three-line alternative suggestion, and Tejun sent the "real patch" almost immediately (b) the for_each_active_class() thing that I think would actually be better off just being done explicitly in sched/core.c, but probably only makes sense after integration and it really strikes me as "neither of these issues were worth nine months of delay". Although hopefully the nine months weren't entirely unproductive, and maybe the patches have improved. Knock wood. Because *most* of the two files above are actually normal stuff. Things that other scheduler classes already do. Sometimes just with an #ifdef around it (although many of the ifdef's are basically hidden as "inline function that is empty if disabled" - which is how we tend to do things in the kernel in general. So it's not even "200+ lines of objectionable code". No. It's all fairly small. Arguably much of the strangeness comes from "it can be enabled / disabled both statically and dynamically". The "dynamically disable" code may look particularly odd because some of that is the "scx_switched_all()" thing that basically disables some of the CFS special cases. Is that pretty? No, but in most cases I'd say that the cause of said issue was that the CFS rules were hardcoded. So it's pretty small, it's pretty self-contained, and it only affects one single subsystem. See where I'm coming from? > Aside of that you are completely ignoring my point. > > Collaborative integration is the right thing to do no matter what. Yes. And am willing to say "ok, if three more months make this more amicable, I'll delay merging for another release". Amicable resolutions are obviously preferred. And I am certainly willing to going back and telling people "ok, it's going to be 6.12, not 6.11". Some people are going to be disappointed in me. That's my job, and I don't think this needs to be *rushed*. But really, in the very next sentence of "I don't think this needs to be *rushed*", I do want to re-iterate that I feel like this pointless delay this has seen seems unnecessary. I hope I have explained above why I think this whole thing has not been worth the brouhaha and pain that it has caused. It's really not that big, and the issues I see have either had small clean trivial solutions, or have very much looked to me like "ok, it's not integrated, but I see why an external patch would try to do it that way". And yes, I may be missing some big ugly point. But I really _have_ been going through those roughly 200 lines of diff that actually seem to be relevant to this whole argument. Multiple times. So if I don't see it, please point it out very very explicitly, and using small words to make me understand. Linus