Hi Bjorn, all,
On Thu, Jan 28, 2021 at 6:31 PM Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote:
On Tue, Jan 26, 2021 at 10:46:04AM -0600, Jeremy Linton wrote:
> Does that mean its open season for ECAM quirks, and we can expect
> them to start being merged now?
"Open season" makes me cringe because it suggests we have a license to
use quirks indiscriminately forever, and I hope that's not the case.
Lorenzo is closer to this issue than I am and has much better insight
into the mess this could turn into. From my point of view, it's
shocking how much of a hassle this is compared to x86. There just
aren't ECAM quirks, in-kernel clock management, or any of that crap.
I don't know how they do it on x86 and I don't have to care. Whatever
they need to do, they apparently do in AML. Eventually ARM64 has to
get there as well if vendors want distro support.
I don't want to be in the position of enforcing a draconian "no more
quirks ever" policy. The intent -- to encourage/force vendors to
develop spec-compliant machines -- is good, but it seems like the
reward of having compliant machines "just work" vs the penalty of
having to write quirks and shepherd them upstream and into distros
will probably be more effective and not much slower.
The problem is that the third party IP vendors (still) make too much junk. For
years, there wasn't a compliance program (e.g. SystemReady with some of the
meat behind PCI-SIG compliance) and even when there was the third party IP
vendors building "root ports" (not even RCs) would make some junk with a hacked
up Linux kernel booting on a model and demo that as "PCI". There wasn't the
kind of adult supervision that was required. It is (slowly) happening now, but
it's years and years late. It's just embarrassing to see the lack of ECAM that
works. In many cases, it's because the IP being used was baked years ago or
made for some "non server" (as if there is such a thing) use case, etc. But in
others, there was a chance to do it right, and it still happens. Some of us
have lost what hair we had over the years getting third party IP vendors to
wake up and start caring about this.
So there's no excuse. None at all. However, this is where we are. And it /is/
improving. But it's still too slow, and we have platforms still coming to
market that need to boot and run. Based on this, and the need to have something
more flexible than just solving for ECAM deficiencies (which are really just a
symptom), I can see the allure of an SMC. I don't like it, but if that's where
folks want to go, and if we can find a way to constrain the enthusiasm for it,
then perhaps it is a path forward. But if we are to go down that path it needs
to come with a giant warning from the kernel that a system was booted at is
relying on that. Something that will cause an OS certification program to fail
without a waiver, or will cause customers to phone up for support wondering why
the hw is broken. It *must* not be a silent thing. It needs to be "this
hardware is broken and non-standard, get the next version fixed".