On Thu, Jul 3, 2014 at 1:12 PM, Paul Burton <paul.burton@xxxxxxxxxx> wrote: > On Thu, Jul 03, 2014 at 10:56:10AM -0700, Ed Swierk wrote: >> Now that Linux makes user stacks >> non-executable by default, the current FP emulation approach is simply >> broken. > > Really? I wasn't aware of any change to the default attributes of the > stack. Do you know what changed? From a quick look at fs/binfmt_elf.c & > arch/mips/include/asm/elf.h I can't see anything relevant having > changed - the stack should be executable unless a non-executable > PT_GNU_STACK header is present in the ELF. I don't suppose the issue > is simply that such a PT_GNU_STACK header is present in your binaries? Actually that was a completely unsupported assertion on my part. I have no reason to believe there was a change in behavior in the kernel or the toolchain (gcc 4.9.0, x86_64 host, mips target; binutils 2.24.51.20140425). What I do notice is that mips-linux-gnu-gcc generates no .note.GNU-stack section, while x86_64-linux-gnu-gcc does. In turn, ld produces no GNU_STACK program header on the mips executable, while for x86_64 it produces GNU_STACK with RW (no E) flags. The toolchain behavior is the same for gccgo as for gcc. But I get a segv on the Octeon2 target only when running a gccgo-generated executable. A C program compiled with gcc works fine performing the same FP operations. And when I add the following hack to mips/include/asm/elf.h in the kernel, the segv goes away: #define elf_read_implies_exec(ex, have_pt_gnu_stack) 1 So I assume gccgo or libgo is doing some extra magic that makes the stack non-executable on mips at least. >> I'm wondering if instead of trying to free the page >> for the FP branch delay emuframe immediately, it would be simpler to >> leave it around until the thread is destroyed. > > It's not really an issue of freeing a page - my patch mapped one page > per-mm (per-process) and that page was left intact for the life of that > mm (process). Ah, I see. What if we allocate a page per thread rather than per process? Then the bookkeeping becomes a lot simpler, as there can be only a single emuframe in the page at one time. And we can defer freeing the page until the thread exits. Assuming we could tolerate the overhead of an entire page for a puny little emuframe, do you think the approach would work? --Ed