On 10/20/22 10:59, Borislav Petkov wrote:
On Wed, Sep 28, 2022 at 01:49:34PM +0300, Maxim Levitsky wrote:
Patch 5 is the main fix - it makes the kernel to be tolerant to a
broken CPUID config (coming hopefully from hypervisor), where you have
a feature (AVX2 in my case) but not a feature on which this feature
depends (AVX).
I really really don't like it when people are fixing the wrong thing.
Why does the kernel need to get fixed when something else can't get its
CPUID dependencies straight? I don't even want to know why something
would set AVX2 without AVX?!?!
Users do so because they just "disable AVX" (e.g. in QEMU -cpu
host,-avx) and that removes the AVX bit. Userspace didn't bother to
implement the whole set of CPUID bit dependencies for AVX because:
1) Intel is adding AVX features every other week and probably half the
time people would forget to add the dependency
2) anyway you absolutely need to check XCR0 before using AVX, which in
the kernel is done using cpu_has_xfeatures(XFEATURE_MASK_YMM), and
userspace *does* remove the XSAVE state from 0Dh leaf if you remove AVX.
(2) in particular holds even on bare metal. The kernel bug here is that
X86_FEATURE_AVX only tells you if the instructions are _present_, not if
they are _usable_. Indeed, the XCR0 check is present for all other
files in arch/x86/crypto, either instead or in addition to
boot_cpu_has(X86_FEATURE_AVX).
Maxim had sent a patch about a year ago to do it in aesni-intel-glue.c
but Dave told him to fix the dependencies instead
(https://lore.kernel.org/all/20211103124614.499580-1-mlevitsk@xxxxxxxxxx/).
What do you think of applying that patch instead?
Thanks,
Paolo