* Linus Torvalds <torvalds@xxxxxxxx> wrote: > And even more interestingly (at least to me), the question might > become one of "how does that affect the tools and build and > configuration infrastructure", and just the general flow of > development. > > I don't think one or two filesystems (and a few drivers) splitting is > anythign new, but if this ends up becoming _more_ common, maybe that > implies a new model entirely.. at least for core kernel stuff, it's hard to split things in any manageable way (as you mentioned it as well) - so higher flux is inevitable. So what i've been focusing on more in the past year or so is to enable the core kernel to take more development flux, via kernel features. Instead of adding more features to the kernel, i'm quite interested in seeing more technologies that make a higher development flux safer: to make the kernel more debuggable, to make bugs more reportable for users, to make the effects of bugs less harmful, and to make the kernel itself notice more bugs by itself. To be able to handle a higher development flux in core code, i think we need the following policies wrt. core kernel changes: - More code consolidation between architectures and subsystems. Core kernel changes impact "non-mainstream" architectures the most - while some of our best technologies root from non-mainstream technologies. So it's a net loss to only concentrate on the mainstream, because developer and technology distribution does not follow user distribution. The generic irq subsystem, spinlock and semaphore/mutex consolidation are all efforts in this direction. I consider the Generic Time Of Day (GTOD) effort a similarly important item, for the same reasons. There are other good examples too, for example klibc is a good step towards a more consolidated boot process. The Xen subarch work triggers consolidation too - etc. Andrew's policy of "you must not break _any_ architecture in -mm" is very important too. And we should do consolidation even in cases where there's some minimal runtime cost. Being able to handle higher flux is more important than getting the last cycle out of the system. This does not mean we should reject patches that do get those last cycles, this only means we should not reject consolidation patches on the grounds that they _lose_ a few cycles. I dont think this is a common problem for consolidation projects right now - but it could happen in the future. - Even more cleanups. We always preferred cleanups but it now becomes critical: i strongly believe that cleanups must take precedence over feature work. [with a few rare and temporary exceptions perhaps, like hardware-enablement or really critical features.] It's much easier to spot bugs in clean code, plus it's much easier for automated correctness validators to find bugs in clean code. (My own examples here include spinlock-init cleanups, which directly enabled things like the lock validator. But pure code cleanups apply too. ) - More automated correctness-checking tools and kernel features. While the preferred mode of avoiding bugs should be a clean design and clean code, higher flux introduces higher noise and bugs are inevitable. So the importance of automated tools (both static and dynamic analysis) increased. Sparse annotations are one good example. My own examples here are the lock validator, the mutex debugging code, the consolidated spinlock debugging code. Some of these are direct feature-enablers: for example the smp_processor_id() debugging code directly enabled a safe and painless migration to PREEMPT_BKL. One nice feature in the works that can find hard-to-spot bugs is kmemleak. - Coding style police! With higher development flux it is becoming even more important for kernel developers to review other developer's work. But that is very hard if the coding style varies too much. This is a fundamentally human problem, and the only sane solution is brutal: the _strict_ Linus coding style must be used in all high-flux subsystems. - More debuggability, reportability. In this area we still suck quite a bit, and this affects userspace too: currently we have nothing equivalent to things like Dr Watson, in Linux most of the info about the first userspace crash almost always gets lost! (and even afterwards, once debug packages are downloaded and the app is run in gdb, it's still too painful for the user, so we lose lots of feedback.) Some of the GUIs try to do something about this and automate crash reporting, but it doesnt cover most of the app crashes and userspace clearly needs kernel help, because ptrace is too inflexible for this purpose. (help is on the way though, there's a next-gen ptrace project that solves these problems very cleanly.) There are a number of important projects going on in this area - for example the dwarf unwinder for x86_64 to improve the quality of kernel oopses, and kgdb (or bits of NLKD) if it gets clean enough. my own impression is that things are going in the right direction, but that there should be more awareness of these principles. I think if we add a couple of more key technologies then we can take the higher kernel development flux just fine, without compromising quality. Even though Linux has lots of developers, we should be more economic with that development power and should waste less of that on unnecessarily complex debugging tasks. I do consider the forking of a subsystem the "easy way out" - the hard and more correct approach is i think to turn every drastic rewrite into small manageable steps. That's much easier said than done, and it's sometimes 10 times the work but it's alot safer - and the end result is often wildly different (and alot cleaner!) from what one would do via a drastic rewrite. A dumb 'cp -a' copying of a subsystem will preserve most of the legacies and architectural inefficiencies. Even an intelligent drastic rewrite preserves most of the legacies - there's just so much of change users can take at once, and _eventually_ a new subsystem has to be exposed to real users - at which point the compatibility constraints apply again. I have yet to see a single case of hard physical necessity to throw away an old subsystem due to legacies. I think the prime example to follow is how Al Viro works - he's beein maintaining the VFS for many years without having to duplicate functionality, without breaking the world, but he still managed to turn the VFS upside down, inside out, in small, manageable steps. It _is_ possible in almost every case, for all but the most spaghetti pieces of code. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html