Re: [GIT PULL] omap changes for v2.6.39 merge window

Nicolas Pitre <nico@xxxxxxxxxxx> · Wed, 30 Mar 2011 19:31:59 -0400 (EDT)

On Wed, 30 Mar 2011, Linus Torvalds wrote:

> On Wed, Mar 30, 2011 at 1:41 PM, Nicolas Pitre <nico@xxxxxxxxxxx> wrote:
> >
> > If in your mind "competitors" == "morons" then you might be right.
> 
> There's a difference between "competition" and "do things differently
> just to be difficult".

Absolutely.  We've seen that from some proprietary software companies.

> > Trying to rely on bootloaders doing things right is like saying that x86
> > should always rely on the BIOS doing things right.
> 
> No. Not at all.
> 
> The problem with firmware/BIOS is that it's set in stone and closed-source.
> 
> I'm suggesting splitting out the crazy part into a separate project
> that does this. Open-source. Like a mini-kernel. Because the thing is,
> the main kernel doesn't care, and _shouldn't_ care. Those board files
> are just noise.

Sure, but important noise nevertheless.  As long as the noise is 
confined to a limited set of .c files I'm happy.  OTOH I have very 
little hope for a separate project that would only deal with that noise.  
That will simply never fly, even less so as an Open Source project.  
The insentive for people to work on such thing simply aren't there as 
that is totally uninteresting and without any rewards.

Furthermore, this does create pain.  you have to make things in sync 
between the kernel and the mini-kernel (let's call it bootloader).  In 
practice the bootloader is always maintained separately from the kernel, 
on its own pace and with its own release schedule.  Trying to 
synchronize independent projects is really painful as you know already, 
otherwise the user space for perf would still be maintained separately 
from the kernel, right?

Now, when there is a bug in one of the clock settings, or one clock 
is missing for that new kernel driver to work properly, the 
bootloader would have to be fixed, revalidated, and the fix deployed 
separately but still in addition to the kernel.  This process still adds 
to the pain such that what people do in those cases is simply to hack 
the driver code in the kernel.  Instead, the OMAP folks created a table 
to abstract them into something more manageable.

And here's the final catch.  Most of those clocks are often derived from 
each other in a tree structure inside the SOC.  And for power saving 
reasons, some crazy people want to dynamically change the config for 
those clocks at run time according to the required frequency for given 
loads, turn them off when possible, and of course turn the parent clock 
off as well if all the children clocks are themselves turned off.  So 
the kernel has NO CHOICE but to be fully aware of them.

Then comes power domains with the cascade of regulators and so forth, 
again all software controlled.  Add to the mix the different sleep 
states that can be derived from that, which is far more sophisticated 
than ACPI states on Intel.  And in some cases, the hardware capabilities 
are there but people still didn't find the optimal way to drive them, so 
research is still on-going software wise.  And obviously those SOC 
vendors do compete on that front since power consumption is the killing 
weapon these days.  No wonder why they are so different from each other 
with all that "board crap".

> The long-term situation should be that you should be able to have ONE
> binary kernel "just work". That's where we are on x86. Really.

But X86 is peanuts.  Really.  There was one machine called the IBM PC at 
some point that everybody cloned, and the rest was totally irrelevant.  
Then came that thing called Windows that reinforced this hardware 
monoculture as it was used for the ultimate conformance testing.  This 
is damn easy in that case to produce a kernel that works virtually 
everywhere.

On ARM there is simply not such thing as a single machine design to 
clone, and a closed source test bench to design for.

And this is orthogonal to this discussion anyway, as having in-kernel 
clock tables is not incompatible with a single kernel binary.  Dropping 
at runtime those clock tables that are irrelevant to the currently 
running hardware is not rocket science.

> Without that kind of long-term view, where do you think ARM is going
> to be in five years?

ARM is going to still be relevant simply because they now have Linux 
that they can modify to suit their latest changes.  That's one thing 
with Open Source which can be good or bad: full hardware compatibility 
is no longer an issue since the software can be adapted at will.

Still... there are on-going efforts to consolidate things amongst all 
the ARM vendors.  The ARM architecture is standardizing more and more 
stuff in the whole stack in every revision.  But they won't standardize 
everything otherwise they'll kill that competing ecosystem.

> >> almost *SIXTY* percent of all arch updates were to ARM code.
> >
> > Absolutely not!  You have 14% going to OMAP code which happens to be
> > under arch/arm/ but there is nothing ARM specific in there.  If OMAP was
> > using a PPC or a MIPS core then you'd have the same result except under
> > arch/powerpc or arch/mips.  There is very little in terms of ARM
> > specific peculiarities under arch/arm/mach-omap2/ in fact.
> 
> But that's my point - the problem is all the crazy board crap.
> 
> I've never claimed that this is about the ARM cpu (which has it's own
> issues, but that's a separate rant entirely). It's about the broken
> infrastructure.

Let's see how we can fix it then.  Trying to shovel the problem away 
won't help the situation.  Those ARM vendors are crazy for sure.  But 
it's not a relatively few merge conflicts compared to the volume of 
changes that will make us flinch, right?

> Now, some of it is quite understandable - ie real drivers for real
> hardware. But a _lot_ of it seems to be just descriptor tables, and
> I'm getting the very strong feeling that ARM people aren't even
> _trying_ to make it sane, and trying to standardize things, or trying
> to aim for the whole notion of "one kernel image, with much more hw
> description done elsewhere".

That work is happening.  It is not ready.  I'm not against it but I 
remain sceptical.  I still think that a self contained kernel is more 
maintainable.

Still, because ARM is just a CPU architecture, those SOC vendors will 
always have something new to differenciate themselves from the other SOC 
vendors.  And that cannot be described in a table alone. The power 
management hardware from TI will still require separate _executable_ 
code from the Freescale one, or the Samsung one, or the Nvidia one, or 
the Qualcomm one, or the Marvell one, yada yada.  And I really don't 
want to see that code turned into some vendor provided buggy ACPI 
bytecode or similar.

> arch/arm is already about 3x the size of arch/x86. And it's pretty
> much all the crazy infrastructure afaik. timer chips, irq chips, gpio
> differences - crap like that.

Indeed.  And I expect it to grow even bigger.  Be warned.

> And the fact that you don't even seem to UNDERSTAND the problem, and
> think that it's ok, and that continued future explosion of this is all
> fine makes me even more nervous.

I do understand the problem.  And so far, the way we scaled is to have 
TI people care about the OMAP code, Freescale people care about the iMX 
code, and so on.  If one of them produces crap code then so it is, and 
the other vendor is totally unaffected, which is why I'm not too 
nervous.  Blaming a merge conflict on the entire ARM ecosystem just 
because one team was large enough to have separate people doing 
different things that intersected into the clock table is blowing things 
totally out of proportion.

And if those hardware vendors are still in business in the future, and 
apparently new ones are joining in, then the arch/arm/ directory will 
continue to gain weight.  And on ARM, Linux is very very successful 
that's all.

Nicolas