Hi Mark, On 1/21/25 11:00 AM, Mark Rutland wrote: > There is a period of time after returning from a KVM_RUN ioctl where > userspace may use SVE without trapping, but the kernel can unexpectedly > discard the live SVE state. Eric Auger has observed this causing QEMU > crashes where SVE is used by memmove(): > > https://issues.redhat.com/browse/RHEL-68997 > > The only state discarded is the user SVE state of the task which issued > the KVM_RUN ioctl. Other tasks are unaffected, plain FPSIMD state is > unaffected, and kernel state is unaffected. > > This happens because fpsimd_kvm_prepare() incorrectly manipulates the > FPSIMD/SVE state. When the vCPU is loaded, fpsimd_kvm_prepare() > unconditionally clears TIF_SVE but does not reconfigure CPACR_EL1.ZEN to > trap userspace SVE usage. If the vCPU does not use FPSIMD/SVE and hyp > does not save the host's FPSIMD/SVE state, the kernel may return to > userspace with TIF_SVE clear while SVE is still enabled in > CPACR_EL1.ZEN. Subsequent userspace usage of SVE will not be trapped, > and the next save of userspace FPSIMD/SVE state will only store the > FPSIMD portion due to TIF_SVE being clear, discarding any SVE state. > > The broken logic was originally introduced in commit: > > 93ae6b01bafee8fa ("KVM: arm64: Discard any SVE state when entering KVM guests") > > ... though at the time fp_user_discard() would reconfigure CPACR_EL1.ZEN > to trap subsequent SVE usage, masking the issue until that logic was > removed in commit: > > 8c845e2731041f0f ("arm64/sve: Leave SVE enabled on syscall if we don't context switch") > > Avoid this issue by reconfiguring CPACR_EL1.ZEN when clearing > TIF_SVE. At the same time, add a comment to explain why > current->thread.fp_type must be set even though the FPSIMD state is not > foreign. A similar issue exists when SME is enabled, and will require > further rework. As SME currently depends on BROKEN, a BUILD_BUG() and > comment are added for now, and this issue will need to be fixed properly > in a follow-up patch. > > Commit 93ae6b01bafee8fa also introduced an unintended ptrace ABI change. > Unconditionally clearing TIF_SVE regardless of whether the state is > foreign discards saved SVE state created by ptrace after syscall entry. > Avoid this by only clearing TIF_SVE when the FPSIMD/SVE state is not > foreign. When the state is foreign, KVM hyp code does not need to save > any host state, and so this will not affect KVM. > > There appear to be further issues with unintentional SVE state > discarding, largely impacting ptrace and signal handling, which will > need to be addressed in separate patches. > > Reported-by: Eric Auger <eauger@xxxxxxxxxx> > Reported-by: Wilco Dijkstra <wilco.dijkstra@xxxxxxx> > Cc: stable@xxxxxxxxxxxxxxx > Cc: Catalin Marinas <catalin.marinas@xxxxxxx> > Cc: Florian Weimer <fweimer@xxxxxxxxxx> > Cc: Jeremy Linton <jeremy.linton@xxxxxxx> > Cc: Marc Zyngier <maz@xxxxxxxxxx> > Cc: Mark Brown <broonie@xxxxxxxxxx> > Cc: Oliver Upton <oliver.upton@xxxxxxxxx> > Cc: Paolo Bonzini <pbonzini@xxxxxxxxxx> > Cc: Will Deacon <will@xxxxxxxxxx> > Signed-off-by: Mark Rutland <mark.rutland@xxxxxxx> Tested-by: Eric Auger <eric.auger@xxxxxxxxxx> Thanks! Eric > --- > arch/arm64/kernel/fpsimd.c | 20 ++++++++++++++++++-- > 1 file changed, 18 insertions(+), 2 deletions(-) > > I believe there are some other issues in this area, but I'm sending this > out on its own because I beleive the other issues are more complex while > this is self-contained, and people are actively hitting this case in > production. > > I intend to follow-up with fixes for the other cases I mention in the > commit message, and for the SME case with the BUILD_BUG_ON(). > > Mark. > > diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c > index 8c4c1a2186cc5..e4053a90ed240 100644 > --- a/arch/arm64/kernel/fpsimd.c > +++ b/arch/arm64/kernel/fpsimd.c > @@ -1711,8 +1711,24 @@ void fpsimd_kvm_prepare(void) > */ > get_cpu_fpsimd_context(); > > - if (test_and_clear_thread_flag(TIF_SVE)) { > - sve_to_fpsimd(current); > + if (!test_thread_flag(TIF_FOREIGN_FPSTATE) && > + test_and_clear_thread_flag(TIF_SVE)) { > + sve_user_disable(); > + > + /* > + * The KVM hyp code doesn't set fp_type when saving the host's > + * FPSIMD state. Set fp_type here in case the hyp code saves > + * the host state. > + * > + * If hyp code does not save the host state, then the host > + * state remains live on the CPU and saved fp_type is > + * irrelevant until it is overwritten by a later call to > + * fpsimd_save_user_state(). > + * > + * This is *NOT* sufficient when CONFIG_ARM64_SME=y, where > + * fp_type can be FP_STATE_SVE regardless of TIF_SVE. > + */ > + BUILD_BUG_ON(IS_ENABLED(CONFIG_ARM64_SME)); > current->thread.fp_type = FP_STATE_FPSIMD; > } >