On Thursday, 8 July 2021 00:08:16 PDT Florian Weimer wrote: > > The first problem is the cross-platformness need. Because we library and > > application developers need to support other OSes, we'll need to deploy > > our > > own CPUID-based detection. It's far better to use common code everywhere, > > where one developer working on Linux can fix bugs in FreeBSD, macOS or > > Windows or any of the permutations. Every platform-specific deviation > > adds to maintenance requirements and is a source of potential latent > > bugs, now or in the future due to refactoring. That is why doing > > everything in the form of instructions would be far better and easier, > > rather than system calls. > I must say this is a rather application-specific view. Sure, you get > consistency within the application across different targets, but for > those who work on multiple applications (but perhaps on a single > distribution/OS), things are very inconsistent. Why would they be inconsistent, if the library is cross-platform? > And the reason why I started this is that CPUID-based feature detection > is dead anyway (assuming the kernel developers do not implement lazy > initialization of the AMX state). CPUID (and ancillary data such as > XCR0) will say that AMX support is there, but it will not work unless > some (yet to decided) steps are executed by the userspace thread. > > While I consider the CPUID-based model a success (and the cross-OS > consistency may have contributed to that), its days seem to be over. Well, we need to design the API of this library such that we can accommodate the various possibilities. For all CPU possibilities, the library needs to be able to tell what the state of support is, among a state of "already enabled", "possible but not enabled" and "impossible", along with a call to enable them. The latter should be supported at least for AVX512 and AMX states. On Linux, only AMX will be tristate, but on macOS we need the tristate for AVX512 too. This library would then wrap all the necessary checking for OSXSAVE and XCR0, so the user doesn't need to worry about them or how the OS enables them, only the features they're interested in. Additionally, I'd like the library to also have constant expression paths that evaluate to constant true if the feature was already enabled at compile time (e.g., -march=x86-64-v3 sets __AVX2__ and __FMA__, so you can always run AVX2 and FMA code, without checking). But that's just icing on top. (it won't come as a surprise that I already have code for most of this) -- Thiago Macieira - thiago.macieira (AT) intel.com Software Architect - Intel DPG Cloud Engineering