Re: [PATCH v2 5/6] mips: use per-mm page to execute FP branch delay slots

Ed Swierk <eswierk@xxxxxxxxxxxxxxxxxx> · Thu, 3 Jul 2014 15:31:48 -0700

On Thu, Jul 3, 2014 at 1:12 PM, Paul Burton <paul.burton@xxxxxxxxxx> wrote:
> On Thu, Jul 03, 2014 at 10:56:10AM -0700, Ed Swierk wrote:
>> Now that Linux makes user stacks
>> non-executable by default, the current FP emulation approach is simply
>> broken.
>
> Really? I wasn't aware of any change to the default attributes of the
> stack. Do you know what changed? From a quick look at fs/binfmt_elf.c &
> arch/mips/include/asm/elf.h I can't see anything relevant having
> changed - the stack should be executable unless a non-executable
> PT_GNU_STACK header is present in the ELF. I don't suppose the issue
> is simply that such a PT_GNU_STACK header is present in your binaries?

Actually that was a completely unsupported assertion on my part. I
have no reason to believe there was a change in behavior in the kernel
or the toolchain (gcc 4.9.0, x86_64 host, mips target; binutils
2.24.51.20140425).

What I do notice is that mips-linux-gnu-gcc generates no
.note.GNU-stack section, while x86_64-linux-gnu-gcc does. In turn, ld
produces no GNU_STACK program header on the mips executable, while for
x86_64 it produces GNU_STACK with RW (no E) flags.

The toolchain behavior is the same for gccgo as for gcc. But I get a
segv on the Octeon2 target only when running a gccgo-generated
executable. A C program compiled with gcc works fine performing the
same FP operations.

And when I add the following hack to mips/include/asm/elf.h in the
kernel, the segv goes away:

   #define elf_read_implies_exec(ex, have_pt_gnu_stack) 1

So I assume gccgo or libgo is doing some extra magic that makes the
stack non-executable on mips at least.

>> I'm wondering if instead of trying to free the page
>> for the FP branch delay emuframe immediately, it would be simpler to
>> leave it around until the thread is destroyed.
>
> It's not really an issue of freeing a page - my patch mapped one page
> per-mm (per-process) and that page was left intact for the life of that
> mm (process).

Ah, I see. What if we allocate a page per thread rather than per
process? Then the bookkeeping becomes a lot simpler, as there can be
only a single emuframe in the page at one time. And we can defer
freeing the page until the thread exits.

Assuming we could tolerate the overhead of an entire page for a puny
little emuframe, do you think the approach would work?

--Ed