Am 10.02.2024 um 20:56 schrieb John Paul Adrian Glaubitz:
That reproducer did not work reliably under all circumstances, because the stack limit was guessed to be 8K to 12K from the current stack pointer, which is not always correct. The size of the stack at the start of main depends on the size of the environment. Please find attached a more robust reproducer.After a lot of debugging, Michael Karcher (CC'ed) has managed to write a reproducer which I am attaching to this mail.
Kind regards, Michael Karcher
// SPARC64 clone problem demonstration // // the sparc64 Linux kernel fails to execute clone if %sp points into uncommitted memory (e.g. due to lazy // stack committing). This program uses a variable length array on the stack to position the stack pointer when // invoking the library function clone just at a page boundary. The library function clone allocates a stack frame // that is completely in uncommitted memory before entering the kernel call clone. // to probe for the correct size of the VLA, a test function is called first. This function records the %fp value it // receives (which will be the %fp value in the library function clone, too, if the VLA size is equal) // (c) Michael Karcher (kernel@xxxxxxxxxxxxxxxxxxxxxxxxxxxx) , 2024, GPLv2 or later #define _GNU_SOURCE #include <sys/mman.h> #include <sys/wait.h> #include <sched.h> #include <stdio.h> #include <stdlib.h> #include <stdint.h> #define SPARC64_STACK_BIAS 0x7FF typedef int fn_t(void*); typedef pid_t clone_t(fn_t* entry, void* stack, int flags, void* arg, ...); // very simple function invoked using clone int nop(void* bar) { return 0; } // clone substitute that records %fp uint64_t call_clone_sp; pid_t dummy_clone(fn_t* entry, void* stack, int flags, void* arg, ...) { register uint64_t frameptr asm("fp"); call_clone_sp = frameptr + SPARC64_STACK_BIAS; // sp in call_clone is fp in dummy_clone / clone return -1; } // function to invoke clone with (im)properly aligned stack void* child_stack; int call_clone(int waste_qwords, clone_t* clonefn) { void* volatile waste[waste_qwords+2]; // volatile to not optimize the array away waste[waste_qwords+1] = NULL; pid_t child_pid = clonefn(nop, child_stack, CLONE_VM | SIGCHLD, 0); if (child_pid > 0) { pid_t waitresult = waitpid(child_pid, NULL, 0); // before fork-bombing anything if this doesn't go to plan, exit if (waitresult != child_pid) abort(); return 0; } else { return -1; } } int main(void) { int wasteamount; // general setup child_stack = mmap(NULL, 16384, PROT_READ | PROT_WRITE, MAP_ANON | MAP_PRIVATE, -1, 0); child_stack = (void*)((char*)child_stack + 16000); clone(NULL, NULL, 0, 0); // fails, but resolves "clone" // detecting stack layout call_clone(0, dummy_clone); printf("effective FP in clone() with waste 0 = %llx\n", call_clone_sp); wasteamount = (call_clone_sp & 0x1FFF) / 8; printf("this is %d 64-bit words above the next page boundary\n", wasteamount); for (; wasteamount < 51200; wasteamount += 1024) { // failes for wasteamount-22 to wasteamount+22 (only even values tested) if (call_clone(wasteamount, clone) < 0) { perror("clone"); printf("Problem detected at %d pages distance\n", wasteamount / 1024); return 1; } } puts("No problems found"); return 0; }