Re: 2GB userspace limitation in ABI N32

David Daney <ddaney.cavm@xxxxxxxxx> · Wed, 10 Oct 2012 09:56:56 -0700

On 10/10/2012 05:57 AM, Rich Felker wrote:
On Wed, Oct 10, 2012 at 10:07:56AM +0200, Ralf Baechle wrote:
On Wed, Oct 10, 2012 at 08:32:47AM +0200, Ronny Meeus wrote:

I have a legacy application that we want to port to a MIPS (Cavium)
architecture from a PPC based one.
The board has 4GB memory of which we actually need almost 3GB in
application space. On the PPC this is no issue since the split
user/kernel is 3GB/1GB.
We have to use the N32 ABI Initial tests on MIPS showed me the
user-space limit of 2GB.
We do not want to port the application to a 64bit

Now the question is: are there any workarounds, tricks existing to get
around this limitation?
I found some mailthreads on this subject (n32-big ABI -
http://gcc.gnu.org/ml/gcc/2011-02/msg00278.html,
http://elinux.org/images/1/1f/New-tricks-mips-linux.pdf) but is looks
like this is not accepted by the community. Is there any process
planned or made in this area?

I think limited time and gain killed the propoosed ABI rather than
theoretical issues raised.

Ralf, I and others have put some thought into doing this in the past. 
This is a rough plan for how it would be done:

1) Define a special ELF section/program header similar to GNU_STACK that 
would be used to mark binaries that could use the 4GB n32 extension. 
Modify GNU gas and ld to mark the binaries and properly propagate the 
markers.

2) Add a n32-4GB option to GCC.  In this mode pointers would be zero 
extended when loaded in to registers.  I have a, currently broken, 
prototype of this implemented.

3) Modify the Linux kernel.
3a) Add a thread_info flag to mark threads that use 4GB of address 
space, TASK_SIZE would then depend on this as well as the other TIF_* 
flags that it currently uses.

3b) Fix up the ELF loader to set the 4GB flag based on the program 
header from #1.

3c) Audit n32 system call entry points for places where pointers are 
sign extended.  Change them to zero extend.  There are not many of these.

4) Rebuild all system libraries to support n32-4G.

The only disadvantage of doing this is that the code will be slightly 
larger/slower as it takes three instructions to load a zero extended 
32-bit pointer verses two for n32-2GB.

 Other architectures such as i386 - well,
IIRC any 32-bit ABI with more than 2GB userspace and a signed
ptrdiff_t - are suffering from them as well.

There's no issue with ptrdiff_t being signed 32-bit as long as the
implementation does not allow individual objects larger than 2GB.
Taking differences between pointers into different objects is UB.

No problem here.  We can just keep loading the VDSO at the 2GB point in 
the address space.  That will break things up so that all possible 
objects are smaller than 2GB.