Re: [RFC PATCH v1 0/8] Introduce mseal() syscall

Jeff Xu <jeffxu@xxxxxxxxxx> · Tue, 17 Oct 2023 20:18:47 -0700

On Tue, Oct 17, 2023 at 4:57 PM Theo de Raadt <deraadt@xxxxxxxxxxx> wrote:
>
> Jeff Xu <jeffxu@xxxxxxxxxx> wrote:
>
> > May I ask, for BSD's implementation of immutable(), do you cover
> > things such as mlock(),
> > madvice() ? or just the protection bit (WRX) + remap() + unmap().
>
> It only prevents removal of the mapping, placement of a replacement
> mapping, or changing the existing permissions.  If one page in the
> existing sub-region is marked immutable, the whole operation fails with
> EPERM.
>
> Those are the only user-visible aspects that an attacker cares about to
> utilize in this area.
>
> mlock() and madvise() deal with the physical memory handling underneath
> the VA.  They have nothing to do with how attack code might manipulate
> the VA address space inside a program to convert a series of dead-end
> approaches into a succesfull escalation strategy.
>
> [It would be very long conversation to explain where and how this has
> been utilized to make an attack succesfull]
>
> > In other words:
> > Is BSD's definition of immutable equivalent to
> > MM_SEAL_MPROTECT|MM_SEAL_MUNMAP|MM_SEAL_MREMAP|MM_SEAL_MMAP, of this patch set ?
>
> I can't compare it to your subsystem, because I completely fail to
> understand the cause or benefit of all the complexity.
>
> I think I've explained what mimmutable() is in extremely simple terms.
>

Thanks for the explanation, based on those, this is exactly what the
current set of patch does.
In practice: libc could do below:
#define MM_IMMUTABLE
(MM_SEAL_MPROTECT|MM_SEAL_MUNMAP|MM_SEAL_MREMAP|MM_SEAL_MMAP)
mseal(add,len, MM_IMMUTABLE)
it will be equivalent to BSD's immutable().

> And I don't understand else you are trying to do anything beyond what

> mimmutable() offers.  It seems like this is inventing additional
> solutions without proof that any of them are necessary to solve the
> specific problem that is known.
>
> > I hesitate to introduce the concept of immutable into linux because I don't know
> > all the scenarios present in linux where VMAs's metadata can be
> > modified.
>
> Good grief.  It seems obvious if you want to lock the change-behaviour
> of an object (the object in this case being a VA sub-region, there is a
> datastructure for that, in OpenBSD it is called an "entry"), then you
> put a flag in that object's data-structure and you simply check the flag
> everytime a change-operation is attempted.  It is a flag which gets set,
> and checked.  Nothing ever clears it (except address space teardown).
>
> This flag must be put on the data structure that manages VA sub-ranges.
>
> In our case when a prot/mapping operation reaches low-level code that
> will want to change an "entry", we notice it is not allowed and simply
> percolate EPERM up through the layers.
>
> > There could be quite a few things we still need to deal with, to
> > completely block the possibility,
> > e.g. malicious code attempting to write to a RO memory
>
> What?!  writes to RO memory are blocked by the permission bits.
>
> > or change RW memory to RWX.
>
> In our case that is blocked by W^X policy.
>
> But if the region is marked mimmutable, then that's another reason you cannot
> change RW to RWX.  It seems so off-topic, to talk about writes to RO memory.
> I get a feeling you are a bit lost.
>
> immutable() is not about permissions, but about locking permissions.
> - You can't change the permissions of the address space region.
> - You cannot map a replacement object at the location instead (especially
>   with different permission).
> - You cannot unmap at that location (which you would do if you wanted to
>   map a new object, with a different permission).
>
> All 3 of these scenarios are identical.  No regular code performs these 3
> operations on regions of the address space which we mark immutable.
>
> There is nothing more to mimmutable in the VM layer.  The hard work is
> writing code in execve() and ld.so which will decide which objects can
> be marked immutable automatically, so that programs don't do this to
> themselves.
>
> I'm aware of where this simple piece fits in.  It does not solve all
> problems, it is a very narrow change to impact a problem which only
> high-value targets will ever face (like chrome).
>
> But I think you don't understand the purpose of this mechanism.
>

In linux cases, I think, eventually, mseal() will have a bigger scope than
BSD's mimmutable().  VMA's metadata(vm_area_struct) contains a lot
of control info, depending on application's needs, mseal() can be
expanded to seal individual control info.

For example, in madvice(2) case:
As Jann point out in [1] and I quote:
"you'd probably also want to block destructive madvise() operations
that can effectively alter region contents by discarding pages and
such, ..."

Another example: if an application wants to keep a memory always
present in RAM, for whatever the reason, it can call seal the mlock().

To handle those two new cases. mseal() could add two more bits:
MM_SEAL_MADVICE, MM_SEAL_MLOCK.

It is practical to keep syscall extentable, when the business logic is the same.

I think I  explained the logic of using bitmasks in the mseal()
interface clearly with the example of madvice() and mlock().

-Jeff

[1] https://lore.kernel.org/lkml/CAG48ez3ShUYey+ZAFsU2i1RpQn0a5eOs2hzQ426FkcgnfUGLvA@xxxxxxxxxxxxxx/

> > If, as part of immutable, I also block madvice(), mlock(), which also updates
> > VMA's metadata, so by definition, I could.  What if the user wants the
> > features in
> > madvice() and at the same time, also wants their .text protected ?
>
> I have no idea what you are talking about.  None of those things relate
> to the access permission of the memory the user sees, and therefore none
> of them are in the attack surface profile which is being prevented.
>
> Meaning, we allow madvise() and mlock() and mphysicalquantummemory() because
> those relate to the physical storage and not the VA permission model.
>
> > Also, if linux introduces a new syscall that depends on a new metadata of VMA,
> > say msecret(), (for discussion purpose), should immutable
> > automatically support that ?
>
> How about the future makingexcuses() system call?
>
> I don't think you understand the problem space well enough to come up with
> your own solution for it.  I spent a year on this, and ship a complete system
> using it.  You are asking such simplistic questions above it shocks me.
>
> Maybe read the LWN article;
>
>     https://lwn.net/Articles/915640/
>
> > Without those questions answered, I couldn't choose the route of
> > immutable() yet.
>
> "... so I can clearly not choose the wine in front of you."
>
> If you don't understand what this thing is for, and cannot minimize the
> complexity of this thing, then Linux doesn't need it at all.
>
> I should warn everyone the hard work is not in the VM layer, but in
> ld.so -- deciding which parts of the image to make immutable, and when.
> It is also possible to make some segments immutable directly in execve()
> -- but in both cases you better have a really good grasp on RELRO
> executable layout or will make too many pieces immutable...
>
> I am pretty sure Linux will never get as far as we got. Even our main
> stacks are marked immutable, but in Linux that would conflict with glibc
> ld.so mprotecting RWX the stack if you dlopen() a shared library with
> GNUSTACK, a very bad idea which needs a different fight...