Jason! On Wed, Nov 23 2022 at 19:38, Thomas Gleixner wrote: > On Wed, Nov 23 2022 at 12:58, Jason Gunthorpe wrote: >> I find your perspective on driver authors as the enemy quite >> interesting :) > > I'm not seeing them as enemies. Just my expectations are rather low by > now :) This made me think about it for a while. Let me follow up on that. When I set out to add real-time capabilities to the kernel about 20 years ago, I did a thorough analysis of the kernel design and code base. It turned out that aside of well encapsulated infrastructure, e.g. mm, vfs, scheduler, network core, quite some of the rest was consisting of blatant layering violations held together with duct tape, super glue and haywire-circuit. It was immediately clear to me, that this needs a lot of consolidation and cleanup work to get me even close to the point where RT becomes feasible as an integral part of the kernel. But not only this became clear, I also realized that a continuation of this model will end up in a maintenance nightmare sooner than later. Me and the other people interested in RT estimated back then that it'll take 5-10 years to get this done. Boy, we were young and naive back then and completely underestimating the efforts required. Obviously we were also underestimating the concurrent influx of new stuff. Just to give you an example. Our early experiments with substituting spinlocks was just the start of the horrors. Instead of working on the actual substitution mechanisms and the required other modifications, we spent a vast amount of our time chasing dead locks all over the place. My main test machine had not a single device driver which was correct and working out of the box. What's worse is that we had to debate with some of the driver people about the correctness of our locking analysis and fight for stuff getting fixed. This ended in writing and integrating lockdep, which has thankfully taken this burden of our plate. When I started to look into interrupt handling to add support for threaded interrupts, which are a fundamental prerequisite for RT, the next nightmare started to unfold. The "generic" core code was a skeleton and everything real was implemented in architecture specific code in completely incompatible ways. It was not even possible to change common data structures without breaking the world. What was even worse, drivers fiddled in the interrupt descriptors just to scratch an itch. What I learned pretty fast is that most driver writers try to work around short-comings in common infrastructure instead of tackling the problem at the root or talking to the developers/maintainers of that infrastructure. The consequence of that is: if you want to change core infrastructure you end up mopping up the driver tree in order not to break things all over the place. There are clearly better ways to spend your time. So I started to encapsulate things more strictly - admittedly to make my own life easier. But at the same time I always tried hard to make these encapsulations easy to use, to provide common infrastructure in order to replace boilerplate code and to help with resource management, which is one of the common problems in driver code. I'm also quite confident that I carefully listened to the needs of driver developers and I think the whole discussion about IMS last year is a good example for that. I surely have opinions, but who doesn't? So no, I'm not seeing driver writers as enemies. I'm just accepting the reality that quite some of the drivers are written in "get it out the door" mode. I'm well aware that there are other folks who stay around for a long time and do proper engineering and maintenance, but that's sadly the minority. Being responsible for core infrastructure is an interesting challenge especially with the zoo of legacy to keep alive and the knowledge that you can break the world with a trivial and obviously "correct" change. Been there, done that. :) Thanks, Thomas