Re: Endless loop on execution attempt on non-executable page

David Daney <ddaney.cavm@xxxxxxxxx> · Thu, 12 May 2016 08:57:12 -0700

On 05/12/2016 07:23 AM, Ralf Baechle wrote:
On Thu, May 12, 2016 at 03:07:51PM +0200, Florian Weimer wrote:

On 05/12/2016 02:53 PM, Ralf Baechle wrote:
On Thu, May 12, 2016 at 12:46:37PM +0200, Florian Weimer wrote:

The GCC compile farm has a big-endian 64-bit MIPS box.  The kernel is:

Linux erpro8-fsf1 3.14.10-er8mod-00013-ge0fe977 #1 SMP PREEMPT Wed Jan
14 12:33:22 PST 2015 mips64 GNU/Linux

Which is a vendor kernel for the EdgeRouter Pro-8.

/proc/cpuinfo reports:

system type             : UBNT_E200 (CN6120p1.1-1000-NSP)
machine                 : Unknown
processor               : 0
cpu model               : Cavium Octeon II V0.1

While testing W^X (execmod, DEP, NX) stack enforcement, I noticed that once
I try to execute code off a non-executable page, I do not get a signal, but
the code appears to enter an infinite loop.  The generated function starts
with a jump instruction to return to the caller, but instead, the program
counter does not seem to change at all.

“si” in GDB also hangs (but can be interrupted with ^C).

My test code is here:

   https://pagure.io/execmod-tests

Is this a kernel bug or an issue with the silicon?

I see the test case uses mprotect to add PROT_EXEC after writing the code
to memory.  I don't think mprotect however gives any guarantee that this
will make the I-cache coherent with the D-cache, that is that the CPU will
actually fetch and execute the instruction that were just written to memory.
For that you have to do something architecture specific such as dancing
around a fire waving a dead chicken.  Or on MIPS call cacheflush(), see
the man page for details.

There is a fork between the write and the execute.  It is somewhat unlikely
that that's not a barrier, but it did happen on POWER.

However, I can successfully execute code without the barrier, so this whole
thing goes in the wrong direction. :)

I have added it, just to be on the safe side.

For portability sake to some broken processors you should also ensure
that a 32 byte cache line is entirely filled with valid instructions by
padding the two test instructions with another six no-op (opcode 0).

Added as well.

The test case as it is guarantees this implicitly by using a freshly
allocated page but I thought I should mention it.

There are some tests that don't (the stack variable might be clobbered, for
example).

Anyway, neither change fixed things for me.  Given the peculiar “si”
behavior in GDB, that's not entirely unexpected ...

Thanks for fixing and testing this obvious things.  Now let's look one
or two levels deeper ...

This is something that would be easy to diagnose on the OCTEON simulator...

Before spending time doing that, has anyone tried this on current 
kernels rather than the 3.14 indicated above?

It might also be interesting to know if it still happens when booting on 
only a single CPU rather than what I assume is the default on this 
platform of all available CPUs

David Daney