On Thu, Mar 25, 2021 at 3:59 PM Len Brown <lenb@xxxxxxxxxx> wrote: > > On Sat, Mar 20, 2021 at 4:57 PM Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote: > > > We won't enable features which are unknown ever. Keep that presilicon > > test gunk where it belongs: In the Intel poison cabinet along with the > > rest of the code which nobody ever want's to see. > > I agree, it would be irresponsible to enable unvalidated features by default, > and pre-silicon "test gunk" should be kept out of the upstream kernel. > > This patch series is intended solely to enable fully validated > hardware features, > with product quality kernel support. > > The reason that the actual AMX feature isn't mentioned until the 16th > patch in this series > is because all of the patches before it are generic state save/restore patches, > that are not actually specific to AMX. > > We call AMX a "simple state feature" -- it actually requires NO KERNEL ENABLING > above the generic state save/restore to fully support userspace AMX > applications. Regardless of what you call AMX, AMX requires kernel enabling. Specifically, it appears that leaving AMX in use in the XINUSE sense degrades system performance and/or power. And the way to handle that in kernel (TILERELEASE) cannot possibly be construed as generic. Here's a little summary of XSTATE features that have failed to be simple: - XMM: seemed simple, but the performance issues switching between legacy and VEX are still unresolved. And they affect the kernel, and people have noticed and complained. - ZMM and the high parts of X/YMM: Intel *still* hasn't documented the actual performance rules. Reports from people trying to reverse engineer it suggest that it's horrible on all but the very newest chips. For some reason, glibc uses it. And it broke sigaltstack. I have NAKked in-kernel AVX-512 usage until Intel answers a long list of questions. No progress yet. - PKRU: makes no sense as an XSAVE feature. - AMX: XFD, as I understand it, has virtualization problems. And the TILERELEASE issue is unresolved. Intel's track record here is poor. If you want the kernel to trust Intel going forward, Intel needs to build trust first. > So after the generic state management support, the kernel enabling of AMX > is not actually required to run applications. Just like when a new instruction > is added that re-uses existing state -- the application or library can check > CPUID and just use it. It is a formality (perhaps an obsolete one), that > we add every feature flag to /proc/cpuid for the "benefit" of userspace. Even this isn't true. AVX-512 already Broke ABI (tm). Sorry for the big evil words, but existing programs that worked on Linux stopped working due to kernel enablement of AVX-512. AMX has the same problem, except more than an order of magnitude worse. No credible resolution has shown up, and the only remotely credible idea anyone has mentioned is to actually mask AMX in XCR0 until an application opts in to an as-yet-undetermined new ABI. --Andy