Jeff Xu <jeffxu@xxxxxxxxxxxx> wrote: > I considered Theo's inputs from OpenBSD's perspective regarding the > difference, and I wasn't convinced that Linux should remove these. In > my view, those are two different kernels code, and the difference in > Linux is not added without reasons (for MAP_SEALABLE, there is a note > in the documentation section with details). That note is describing a fiction. > I would love to hear more from Linux developers on this. I'm not sure you are capable of listening. But I'll repeat for others to stop this train wreck: 1. When execve() maps a programs's .data section, does the kernel set MAP_SEALABLE on that region? Or does it not set MAP_SEALABLE? Does the kernel seal the .data section? It cannot, because of RELRO and IFUNCS. Do you know what those are? (like in OpenBSD) the kernel cannot and will *not* seal the .data section, it lets later code do that. 2. When execve() maps a programs's .bss section, does the kernel set MAP_SEALABLE on that region? Or does it not set MAP_SEALABLE? Does the kernel seal the .bss section? It cannot, because of RELRO and IFUNCS. Do you know what those are? (like in OpenBSD) the kernel cannot and will *not* seal the .bss section, it lets later code do that. In the proposed diff, the kernel does not set MAP_SEALABLE on those regions. How does a userland program seal the .data and .bss regions? It cannot. It is too late to set the MAP_SEALABLE, because the kernel already decided not do to it. So those regions cannot be sealed. 3. When execve() maps a programs's stack, does the kernel set MAP_SEALABLE on that region? Or does it not set MAP_SEALABLE? In the proposed diff, the kernel does not set MAP_SEALABLE. You think you can seal the stack in the kernel?? Sorry to be the bearer of bad news, but glibc has code which on occasion will mprotects the stack executable. But if userland decides that mprotect case won't occur -- how does a userland program seal its stack? It is now too late to set MAP_SEALABLE. So the stack must remain unsealed. 4. What about the text segment? 5. Do you know what a text-relocation is? They are now rare, but there are still compile/linker stages which will produce them, and there is software which requires that to work. It means userland fixes it's own .text, then calls mprotect. The kernel does not know if this will happen. 6. When execve() maps the .text segment, will it set MAP_SEALABLE? If it doesn't set it, userland cannot seal it's text after it makes the decision to do. You can continue to extrapolate those same points for all other segments of a static binary, all segments of a dynamic binary, all segments of the shared library linker. And then you can go further, and recognize the logic that will be needed in the shared library linker to *make the same decisions*. In each case, the *decision* to make a mapping happens in one piece of code, and the decision to use and NOW SEAL THAT MAPPING, happens in a different piece of code. The only answer to these problems will be to always set MAP_SEALABLE. To go through the entire Linux ecosystem, and change every call to mmap() to use this new MAP_SEALABLE flag, and it will look something like this: +#ifndef MAP_SEALABLE +#define MAP_SEALABLE 0 +#endif - ptr = mmap(...., MAP... - ptr = mmap(...., MAP_SEALABLE | MAP... Every single one of them, and you'll need to do it in the kernel. If you had spent a second trying to make this work in a second piece of software, you would have realized that the ONLY way this could work is by adding a flag with the opposite meaning: MAP_NOTSEALABLE But nothing will use that. I promise you > I would love to hear more from Linux developers on this. I'm not sure you are capable of listening.