Re: [PATCH v8 0/4] Introduce mseal

"Theo de Raadt" <deraadt@xxxxxxxxxxx> · Wed, 31 Jan 2024 18:46:37 -0700

Jeff Xu <jeffxu@xxxxxxxxxxxx> wrote:

> I considered Theo's inputs from OpenBSD's perspective regarding the
> difference, and I wasn't convinced that Linux should remove these. In
> my view, those are two different kernels code, and the difference in
> Linux is not added without reasons (for MAP_SEALABLE, there is a note
> in the documentation section with details).

That note is describing a fiction.

> I would love to hear more from Linux developers on this.

I'm not sure you are capable of listening.

But I'll repeat for others to stop this train wreck:

1. When execve() maps a programs's .data section, does the kernel set
   MAP_SEALABLE on that region?  Or does it not set MAP_SEALABLE?

   Does the kernel seal the .data section?  It cannot, because of RELRO
   and IFUNCS.  Do you know what those are?  (like in OpenBSD) the kernel
   cannot and will *not* seal the .data section, it lets later code do that.

2. When execve() maps a programs's .bss section, does the kernel set
   MAP_SEALABLE on that region?  Or does it not set MAP_SEALABLE?

   Does the kernel seal the .bss section?  It cannot, because of RELRO
   and IFUNCS.  Do you know what those are?  (like in OpenBSD) the kernel
   cannot and will *not* seal the .bss section, it lets later code do that.

In the proposed diff, the kernel does not set MAP_SEALABLE on those
regions.

How does a userland program seal the .data and .bss regions?

It cannot.  It is too late to set the MAP_SEALABLE, because the kernel
already decided not do to it.

So those regions cannot be sealed.

3. When execve() maps a programs's stack, does the kernel set
   MAP_SEALABLE on that region?  Or does it not set MAP_SEALABLE?

In the proposed diff, the kernel does not set MAP_SEALABLE.

You think you can seal the stack in the kernel??  Sorry to be the bearer
of bad news, but glibc has code which on occasion will mprotects the
stack executable.

But if userland decides that mprotect case won't occur -- how does a
userland program seal its stack?  It is now too late to set MAP_SEALABLE.

So the stack must remain unsealed.

4. What about the text segment?

5. Do you know what a text-relocation is?  They are now rare, but there
   are still compile/linker stages which will produce them, and there is
   software which requires that to work.  It means userland fixes it's
   own .text, then calls mprotect.  The kernel does not know if this will
   happen.

6. When execve() maps the .text segment, will it set MAP_SEALABLE?

If it doesn't set it, userland cannot seal it's text after it makes the
decision to do.

You can continue to extrapolate those same points for all other segments
of a static binary, all segments of a dynamic binary, all segments of the
shared library linker.

And then you can go further, and recognize the logic that will be needed
in the shared library linker to *make the same decisions*.

In each case, the *decision* to make a mapping happens in one piece of
code, and the decision to use and NOW SEAL THAT MAPPING, happens in a
different piece of code.

The only answer to these problems will be to always set MAP_SEALABLE.
To go through the entire Linux ecosystem, and change every call to mmap()
to use this new MAP_SEALABLE flag, and it will look something like this:

+#ifndef MAP_SEALABLE
+#define MAP_SEALABLE 0
+#endif
-	ptr = mmap(...., MAP...
-	ptr = mmap(...., MAP_SEALABLE | MAP...

Every single one of them, and you'll need to do it in the kernel.

If you had spent a second trying to make this work in a second piece of
software, you would have realized that the ONLY way this could work
is by adding a flag with the opposite meaning:

   MAP_NOTSEALABLE

But nothing will use that.  I promise you

> I would love to hear more from Linux developers on this.

I'm not sure you are capable of listening.