On Wed, Feb 15, 2023, Ackerley Tng wrote: > I figured it out! > > GCC assumes that the stack is 16-byte aligned **before** the call > instruction. Since call pushes rip to the stack, GCC will compile code > assuming that on entrance to the function, the stack is -8 from a > 16-byte aligned address. > > Since for TDs we do a ljmp to guest code, providing a function's > address, the stack was not modified by a call instruction pushing rip to > the stack, so the stack is 16-byte aligned when the guest code starts > running, instead of 16-byte aligned -8 that GCC expects. > > For VMs, we set rip to a function pointer, and the VM starts running > with a 16-byte algined stack too. > > To fix this, I propose that in vm_arch_vcpu_add(), we align the > allocated stack address and then subtract 8 from that: > > @@ -573,10 +573,13 @@ struct kvm_vcpu *vm_arch_vcpu_add(struct kvm_vm *vm, > uint32_t vcpu_id, > vcpu_init_cpuid(vcpu, kvm_get_supported_cpuid()); > vcpu_setup(vm, vcpu); > > + stack_vaddr += (DEFAULT_STACK_PGS * getpagesize()); > + stack_vaddr = ALIGN_DOWN(stack_vaddr, 16) - 8; The ALIGN_DOWN should be unnecessary, we've got larger issues if getpagesize() isn't 16-byte aligned and/or if __vm_vaddr_alloc() returns anything but a page-aligned address. Maybe add a TEST_ASSERT() sanity check that stack_vaddr is page-aligned at this point? And in addition to the comment suggested by Maciej, can you also add a comment explaining the -8 adjust? Yeah, someone can go read the changelog, but I think this is worth explicitly documenting in code. Lastly, can you post it as a standalone patch? Many thanks!