> > W^X is more than just stack protection. It means that all pages that > > are writeable are also marked as not executable. At least, it means > > this is how the system by default operates, until some process asks > > for something that has both write and execute permission. > > > > On some architectures W^X is easy, since the native architecture has a > > execute-permitted bit per page (sparc, sparc64, alpha, hppa, m88k). > > On other architectures, it is difficult and various hacks have to be > > done to make it work (i386, powerpc). > > It's not difficult at all on x86, but having non-overlapping Segments > for Code and Data/Stack would limit the virtual address space. I am not sure if you have heard of this neat technology called "shared libraries". Either you have never heard of them, or you are unaware of they work on an x86. Let me be completely blunt. What you are suggesting is unfeasable. Please go do some learning before making any more utterly ridiculous proposals. > This > doesn't matter if your machine is equipped with 2 GB (RAM+Pagefile) or > less, because all pages of those 2 GB can completely be mapped to linear > addresses in either the code or data/stack segment. As soon as there's > more memory available, you have to decide how large the code and > data/stack segment should be. Ridiculous. > Adressing more than 4 GB on x86 is an ugly hack anyways -PSE as well as > PAE. Yet more dribble which is unrelated to the issue at hand. Anyways, on an i386 you can do W^X somewhat. Not as perfectly as you can on cpus that have a per-page X bit... Let me try to summarize the options. 1) Configure the i386 CS code segment limit register so that it cannot reach into the stack area at the end of memory. Hence, you can have code below, and your stack above. This only protects your stack. As many have pointed out, doing so is useless, unless other protection techologies such as ProPolice are used to suppliment the protection. 2) Furthermore, try to make the CS code segment limit register reach only to the end of the data segment. But then a problem shows up. When you use shared libraries, you end up with code followed by data followed by code followed by data etc. Since you only have one line you can draw in the address space, clearly you can't make this work! 3) To resolve this, we made modifications to ld.so and to the base ELF binaries and shared library files that are produced. The idea is to map all CODE from the program, ld.so, and from each of the shared libraries low in memory, and then to map their respective DATA segments HIGHER in memory. We must remember one thing. Each ELF module is internally pre-linked. This means that the code of a module use relative addressing to access the data. Or, put another way, the code and data must remain a FIXED distance from each other in memory; that distance is determined at link time. You cannot change it at run time without significant performance problems and other difficulties. 4) So, now that all the code is down below, and all the data is above we have something like this stack gap gap gap libm data libc data ld.so data program data gap gap gap <---- libm code libc code ld.so code 0: program code 5) If we are clever, we can now change our kernel to put the CS limit register at where the arrow is. If new objects are mapped or unmapped into the address space with X permission, the CS limit register can change up or down. No objects above that line can be executed. In OpenBSD, we've done steps up to 4. We've not done step 5 perfectly yet (we use a "fixed" line). Finally, another option: As an alternative to all this complicated stuff, it is my understanding that some 32-bit x86 cpus in PAE (64 bit PTEs) mode honour the highest bit of the PTE as a NX (non-executable) bit. This would give per-page execution stuff like we have on better cpus. We've not worked on this yet; it is less valuable since I think it is only newer Xeons and high-end AMD cpus which support this. And we've never found documentation for it either :)