Re: [PATCH v2 5/6] mips: use per-mm page to execute FP branch delay slots

Paul Burton <paul.burton@xxxxxxxxxx> · Fri, 4 Jul 2014 09:06:41 +0100

On Thu, Jul 03, 2014 at 03:31:48PM -0700, Ed Swierk wrote:
> On Thu, Jul 3, 2014 at 1:12 PM, Paul Burton <paul.burton@xxxxxxxxxx> wrote:
> > On Thu, Jul 03, 2014 at 10:56:10AM -0700, Ed Swierk wrote:
> >> Now that Linux makes user stacks
> >> non-executable by default, the current FP emulation approach is simply
> >> broken.
> >
> > Really? I wasn't aware of any change to the default attributes of the
> > stack. Do you know what changed? From a quick look at fs/binfmt_elf.c &
> > arch/mips/include/asm/elf.h I can't see anything relevant having
> > changed - the stack should be executable unless a non-executable
> > PT_GNU_STACK header is present in the ELF. I don't suppose the issue
> > is simply that such a PT_GNU_STACK header is present in your binaries?
> 
> Actually that was a completely unsupported assertion on my part. I
> have no reason to believe there was a change in behavior in the kernel
> or the toolchain (gcc 4.9.0, x86_64 host, mips target; binutils
> 2.24.51.20140425).
> 
> What I do notice is that mips-linux-gnu-gcc generates no
> .note.GNU-stack section, while x86_64-linux-gnu-gcc does. In turn, ld
> produces no GNU_STACK program header on the mips executable, while for
> x86_64 it produces GNU_STACK with RW (no E) flags.
> 
> The toolchain behavior is the same for gccgo as for gcc. But I get a
> segv on the Octeon2 target only when running a gccgo-generated
> executable. A C program compiled with gcc works fine performing the
> same FP operations.
> 
> And when I add the following hack to mips/include/asm/elf.h in the
> kernel, the segv goes away:
> 
>    #define elf_read_implies_exec(ex, have_pt_gnu_stack) 1
> 
> So I assume gccgo or libgo is doing some extra magic that makes the
> stack non-executable on mips at least.

Ah, interesting :)

I haven't tried running any go executables before but that and rust are
2 languages I've been curious about for a while.

> >> I'm wondering if instead of trying to free the page
> >> for the FP branch delay emuframe immediately, it would be simpler to
> >> leave it around until the thread is destroyed.
> >
> > It's not really an issue of freeing a page - my patch mapped one page
> > per-mm (per-process) and that page was left intact for the life of that
> > mm (process).
> 
> Ah, I see. What if we allocate a page per thread rather than per
> process? Then the bookkeeping becomes a lot simpler, as there can be
> only a single emuframe in the page at one time. And we can defer
> freeing the page until the thread exits.
> 
> Assuming we could tolerate the overhead of an entire page for a puny
> little emuframe, do you think the approach would work?

Yes, I think it would. The reason I went with the per-mm approach though
was to try to avoid so much overhead. I suppose we could possibly
allocate the page on demand so that threads which don't use FP don't pay
for it, and maybe use the shrinker interface to free the page if we run
low on memory and aren't currently executing from it. Though it would
mean that the FP branch delay "emulation" could fail if memory is tight,
but I suppose that's no worse than now where it could blow the (user)
stack.

I'll try to get a v3 out at some point soon.

Paul