On Mon, Jan 23, 2023 at 10:53 AM Sean Christopherson <seanjc@xxxxxxxxxx> wrote: > > On Mon, Jan 23, 2023, Maciej S. Szmigiero wrote: > > On 23.01.2023 19:30, Erdem Aktas wrote: > > > On Fri, Jan 20, 2023 at 4:28 PM Sean Christopherson <seanjc@xxxxxxxxxx> wrote: > > > > > > > > On Sat, Jan 21, 2023, Ackerley Tng wrote: > > > > > Some SSE instructions assume a 16-byte aligned stack, and GCC compiles > > > > > assuming the stack is aligned: > > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=40838. This combination > > > > > results in a #GP in guests. > > > > > > > > > > Adding this compiler flag will generate an alternate prologue and > > > > > epilogue to realign the runtime stack, which makes selftest code > > > > > slower and bigger, but this is okay since we do not need selftest code > > > > > to be extremely performant. > > > > > > > > Huh, I had completely forgotten that this is why SSE is problematic. I ran into > > > > this with the base UPM selftests and just disabled SSE. /facepalm. > > > > > > > > We should figure out exactly what is causing a misaligned stack. As you've noted, > > > > the x86-64 ABI requires a 16-byte aligned RSP. Unless I'm misreading vm_arch_vcpu_add(), > > > > the starting stack should be page aligned, which means something is causing the > > > > stack to become unaligned at runtime. I'd rather hunt down that something than > > > > paper over it by having the compiler force realignment. > > > > > > Is not it due to the 32bit execution part of the guest code at boot > > > time. Any push/pop of 32bit registers might make it a 16-byte > > > unaligned stack. > > > > 32-bit stack needs to be 16-byte aligned, too (at function call boundaries) - > > see [1] chapter 2.2.2 "The Stack Frame" > > And this showing up in the non-TDX selftests rules that out as the sole problem; > the selftests stuff 64-bit mode, i.e. don't have 32-bit boot code. Thanks Maciej and Sean for the clarification. I was suspecting the hand-coded assembly part that we have for TDX tests but it being happening in the non-TDX selftests disproves it.