Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> writes: > On Mon, Jul 15, 2024 at 9:32 AM Puranjay Mohan <puranjay@xxxxxxxxxx> wrote: >> >> >> Hi Daniel, Manu >> I was able to reproduce this issue on KVM and found the root cause for >> this hang! The other issue that we fixed is unrelated to this hang and >> doesn't occur on self hosted github runners as they use 48-bit VAs. >> >> The userspace test code has: >> >> #define STACK_SIZE (1024 * 1024) >> static char child_stack[STACK_SIZE]; >> >> cpid = clone(do_sleep, child_stack + STACK_SIZE, CLONE_FILES | SIGCHLD, fexit_skel); >> >> arm64 requires the stack pointer to be 16 byte aligned otherwise >> SPAlignmentFault occurs, this appears as Bus error in the userspace. >> >> The stack provided to the clone system call is not guaranteed to be >> aligned properly in this selftest. >> >> The test hangs on the following line: >> while (READ_ONCE(fexit_skel->bss->fentry_cnt) != 2); >> >> Because the child process is killed due to SPAlignmentFault, the >> fentry_cnt remains at 0! >> >> Reading the man page of clone system call, the correct way to allocate >> stack for this call is using mmap like this: >> >> stack = mmap(NULL, STACK_SIZE, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS | MAP_STACK, -1, 0); >> >> This fixes the issue, I will send a patch to use this and once again >> remove this test from DENYLIST and I hope this time it fixes it for good. > > Wow. Great find. Good to know. > prog_tests/ns_current_pid_tgid.c has the same issue probably. Yes, I checked that test as well using gdb and fortunately it gets a 16 byte aligned stack pointer, but this is just luck, so I will send a patch to fix that test as well. Thanks, Puranjay
Attachment:
signature.asc
Description: PGP signature