On 6/23/24 4:14 AM, Thomas Gleixner wrote: > Chris! > > On Fri, Jun 21 2024 at 17:14, Chris Mason wrote: >> On 6/21/24 6:46 AM, Thomas Gleixner wrote: >> I'll be honest, the only clear and consistent communication we've gotten >> about sched_ext was "no, please go away". You certainly did engage with >> face to face discussions, but at the end of the day/week/month the >> overall message didn't change. > > The only time _I_ really told you "go away" was at OSPM 2023 when you > approached everyone in the worst possible way. I surely did not even say > "please" back then. Looping back to where I jumped into this thread, the context was you suggesting that if we'd just asked one more time, real collaboration might have started. I'm not trying to change the message by snipping this out of context, so if I've got this wrong, please do correct me. >>> If you really wanted to get my attention then you exactly know how >>> to get it like everyone else who is working with me for decades. I really don't object to the scheduler maintainers disliking sched_ext. Pluggable scheduling isn't a problem you wanted to solve, and bpf probably isn't how you would have solved it. We could have talked every day for the last 18 months, and by now we'd have a huge library of sonnets and haikus for each other, but sched_ext still wouldn't be merged. I do object to rewriting history to claim that if we'd just used the secret handshake, things would somehow be different than they are now. > > The message people (not only me) perceived was: > > "The scheduler sucks, sched_ext solves the problems, saves us millions I joined a conference I'd never been to before and brought a laundry list of problems to the table. So, there's definitely truth to the perception that I came with an agenda and pushed it. But, if I left anyone with the impression I thought the scheduler sucked, that wasn't my goal. Like every part of the kernel, there are problems the scheduler creates and problems it doesn't solve, and my goal is/was to invest in discussing and fixing both. > and Google is happy to work with us [after dropping upstream scheduler > development a decade ago and leaving the opens for others to mop up]." > It's not a surprise that google and meta have a lot of problems in common. For us, collaborating with google is really rewarding and important for a bunch of subsystems, scheduler included. Google is full of smart people, and carrying private patches is both expensive and wildly boring, and I'm always interested in why smart people use different strategies to solve problems we have in common. It's part of a discussion we get into internally. Why does our kernel team exist at all? Are we just here to stabilize and ship Linus's kernel? Or are we here to try and advance our infrastructure by developing in the kernel? Those are two pretty different paths, and I know that we optimize for things others don't care about, like contorting ourselves to make things easier to ship to production. But, we also optimize for feedback loops with workload owners that other distros and kernel developers would really envy. It's a mixed bag, but I can say with certainty that adding features and optimizations to the upstream kernel is one of the least efficient ways to improve infra. Some of this is for really good reason, nobody wants all the tech debt that would come out of upstreaming every bad idea we've ever had. But there's a balance. Before anyone gets upset with me, the upstream kernel can be the best kernel on the planet and still be a really inefficient way to land features and optimizations for applications. > followed by: > > "You should take it, as it will bring in fresh people to work on the > scheduler due to the lower entry barrier [because kernel hacking sucks]. > This will result in great new ideas which will be contributed back to > the scheduler proper." > > That was a really brilliant marketing stunt and I told you so very bluntly. > Yeah, I'd say all of those things again, and I think I repeated some of it in this email too. It's one of my favorite topics of conversation so I won't bore everyone here (even more than I already have), but I'm always trying to find ways to improve the feedback loops between workloads and the kernel developers. sched_ext has been really effective at that so far, both inside meta and for others in the community. > It was presumably not your intention, but that's the problem of > communication between people. Though I haven't seen an useful attempt to > cure that. > > After that clash, the room got into a lively technical discussion about the > real underlying problem, i.e. that a big part of scheduling issues comes > from the fact, that there is not enough information about the requirements > and properties of an application available. Even you agreed with that, if I > remember correctly. I still do! In a private email a few months ago I promised you that my one true workload modeling project was just a few months away. It still is just a few months away, but I do find the topic really interesting. But, I disagree that we should stop sched_ext development until we find the perfect way to model the properties and requirements of applications. I'm really glad that eevdf landed with more of a iterate-in-the-kernel approach. > > sched_ext does not solve that problem. It just works around it by putting > the requirements and properties of an application into the BPF scheduler > and the user space portion of it. That works well in a controlled > environment like yours, but it does not even remotely help to solve the > underlying general problems. You acknowlegded that and told: But we don't > have it today, though sched_ext is ready and will help with that. For me, the underlying general problems get solved with frequent experiments and tight feedback loops. It's all about iteration and cooperation with the application developers, and sched_ext absolutely does provide that. Quoting from another email of yours in this thread "I recently watched a talk about sched ext which explained how to model an execution pipeline for a specific workload to optimize the scheduling of the involved threads and how innovative that is. I really had a good laugh because that's called explicit plan scheduling and has been described and implemented in the early 2000s by academics already." This is kind of exactly my point. We do agree that there are lots of well understood solutions to well understood problems that are missing from the kernel. > > The concern that sched_ext will reduce the incentive to work on the > scheduler proper is not completely unfounded and I've yet to see the > slightest evidence which proves the contrary. Linus answered this pretty effectively, and I don't see the need to expand on his comments. > > Don't tell me that this is impossible because sched_ext is not yet > upstream. It's used in production successfully as you said, so there > clearly must be something to learn from which could be shared at least in > form of data. OSPM24 would have been a great place for that especially as > the requirements and properties discussion was continued there with a plan. > > At all other occasions, I sat down with people and discussed at a technical > level, but also clearly asked to resolve the social rift which all of this > created. > > I thereby surely said several times: "I wish it would just go away and stay > out of tree", but that's a very different message, no? > No, it's really not a different message. The kernel tree is where kernel development happens best. Linus covered the comparison with RT as well, but I definitely do understand you've had to carry a few patches out of tree. > Quite some of the questions and concerns I voiced, which got also voiced by > others on the list, have not been sorted out until today. Just to name a > few from the top of my head: > > - How is this supposed to work with different applications requiring > different sched_ext schedulers? > I'll let Tejun pitch in on this one. > - How are distros/users supposed to handle this especially when > applications start to come with their own optimized schedulers? > Having worked for two or three distros (I'd count meta, we have customers too), distros pick and choose what to support based on what their customers need and pay for, and different distros will make different choices. I'd assume we'll have a spectrum: - sched_ext is unsupported, talk to the vendor - sched_ext is unsupported, but we'll give debugging a shot - sched_ext is supported when you're using $supported schedulers Vendors might provide optimized schedulers, but they have to support a huge range of distros and gloriously crusty enterprise kernels, so I can't see anyone making it a requirement. > - What's the documented rule for dealing with bugs and regressions on a > system where sched_ext is active? > > > "We'll work it out in tree" is not an answer to that. Ignoring it and let > the rest of the world deal with the fallout is not a really good answer > either. It's not different from any other new kernel component, or old kernel component for that matter. What's the documented rule for dealing with bugs and regressions when a usb nic driver is loaded? If you're asking about bpf ABI, that's been covered in many other threads. > > I'm not saying that this is all your and the sched_ext peoples fault, the > other side was not always constructive either. Neither did it help that I > had to drop the ball. > > For me, Linus telling that he will merge it no matter what, was a wakeup > call to all involved parties. One side reached out with a clear message to > sort this out amicably and not making the situation worse. This last part is where you lost me. I've only seen a clear message to delay for any and every reason you can make stick. I know it's a jerk thing to say and I'm sorry, but that's how it feels from my end. > >> At any rate, I think sched_ext has a good path forward, and I know we'll >> keep working together however we can. > > Carefully avoiding the perception trap, may I politely ask what this is > supposed to tell me? I was shooting for optimism here...we've all known each other a long time. We'll find ways to keep working together. -chris