Re: Buffer overflow prevention

Theo de Raadt <deraadt@cvs.openbsd.org> · Thu, 14 Aug 2003 15:43:10 -0600

> > W^X is more than just stack protection.  It means that all pages that
> > are writeable are also marked as not executable.  At least, it means
> > this is how the system by default operates, until some process asks
> > for something that has both write and execute permission.
> > 
> > On some architectures W^X is easy, since the native architecture has a
> > execute-permitted bit per page (sparc, sparc64, alpha, hppa, m88k).
> > On other architectures, it is difficult and various hacks have to be
> > done to make it work (i386, powerpc).
> 
> It's not difficult at all on x86, but having non-overlapping Segments
> for Code and Data/Stack would limit the virtual address space.

I am not sure if you have heard of this neat technology called "shared
libraries".  Either you have never heard of them, or you are unaware
of they work on an x86.  Let me be completely blunt.  What you are
suggesting is unfeasable.  Please go do some learning before making
any more utterly ridiculous proposals.

> This
> doesn't matter if your machine is equipped with 2 GB (RAM+Pagefile) or
> less, because all pages of those 2 GB can completely be mapped to linear
> addresses in either the code or data/stack segment. As soon as there's
> more memory available, you have to decide how large the code and
> data/stack segment should be.

Ridiculous.

> Adressing more than 4 GB on x86 is an ugly hack anyways -PSE as well as
> PAE.

Yet more dribble which is unrelated to the issue at hand.

Anyways, on an i386 you can do W^X somewhat.  Not as perfectly as you
can on cpus that have a per-page X bit...  Let me try to summarize the
options.

1) Configure the i386 CS code segment limit register so that it cannot
reach into the stack area at the end of memory.

Hence, you can have code below, and your stack above.  This only
protects your stack.  As many have pointed out, doing so is useless,
unless other protection techologies such as ProPolice are used to
suppliment the protection.

2) Furthermore, try to make the CS code segment limit register reach only
to the end of the data segment.

But then a problem shows up.  When you use shared libraries, you end
up with code followed by data followed by code followed by data etc.
Since you only have one line you can draw in the address space,
clearly you can't make this work!

3) To resolve this, we made modifications to ld.so and to the base ELF
binaries and shared library files that are produced.  The idea is to
map all CODE from the program, ld.so, and from each of the shared
libraries low in memory, and then to map their respective DATA segments
HIGHER in memory.

We must remember one thing.  Each ELF module is internally pre-linked.
This means that the code of a module use relative addressing to access
the data.  Or, put another way, the code and data must remain a FIXED
distance from each other in memory; that distance is determined at
link time.  You cannot change it at run time without significant
performance problems and other difficulties.

4) So, now that all the code is down below, and all the data is above we
have something like this

	stack

	gap
	gap
	gap

	libm data
	libc data
	ld.so data
	program data

	gap
	gap
	gap					<----

	libm code
	libc code
	ld.so code
0:	program code

5) If we are clever, we can now change our kernel to put the CS limit
register at where the arrow is.  If new objects are mapped or unmapped
into the address space with X permission, the CS limit register can
change up or down.

No objects above that line can be executed.

In OpenBSD, we've done steps up to 4.  We've not done step 5 perfectly
yet (we use a "fixed" line).

Finally, another option:

As an alternative to all this complicated stuff, it is my
understanding that some 32-bit x86 cpus in PAE (64 bit PTEs) mode
honour the highest bit of the PTE as a NX (non-executable) bit.  This
would give per-page execution stuff like we have on better cpus.
We've not worked on this yet; it is less valuable since I think it is
only newer Xeons and high-end AMD cpus which support this.  And we've
never found documentation for it either :)