Re: [PATCH v4 1/1] exec: seal system mappings

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Andrei

Thanks for your email.
I was hoping to get some feedback from CRIU devs, and happy to see you
reaching out..

On Mon, Dec 9, 2024 at 8:12 PM Andrei Vagin <avagin@xxxxxxxxx> wrote:
>
> On Mon, Nov 25, 2024 at 12:49 PM <jeffxu@xxxxxxxxxxxx> wrote:
> >
> > From: Jeff Xu <jeffxu@xxxxxxxxxxxx>
> >
> > Seal vdso, vvar, sigpage, uprobes and vsyscall.
> >
> > Those mappings are readonly or executable only, sealing can protect
> > them from ever changing or unmapped during the life time of the process.
> > For complete descriptions of memory sealing, please see mseal.rst [1].
> >
> > System mappings such as vdso, vvar, and sigpage (for arm) are
> > generated by the kernel during program initialization, and are
> > sealed after creation.
> >
> > Unlike the aforementioned mappings, the uprobe mapping is not
> > established during program startup. However, its lifetime is the same
> > as the process's lifetime [2]. It is sealed from creation.
> >
> > The vdso, vvar, sigpage, and uprobe mappings all invoke the
> > _install_special_mapping() function. As no other mappings utilize this
> > function, it is logical to incorporate sealing logic within
> > _install_special_mapping(). This approach avoids the necessity of
> > modifying code across various architecture-specific implementations.
> >
> > The vsyscall mapping, which has its own initialization function, is
> > sealed in the XONLY case, it seems to be the most common and secure
> > case of using vsyscall.
> >
> > It is important to note that the CHECKPOINT_RESTORE feature (CRIU) may
> > alter the mapping of vdso, vvar, and sigpage during restore
> > operations. Consequently, this feature cannot be universally enabled
> > across all systems.
> >
> ...
> >
> > +config SEAL_SYSTEM_MAPPINGS
> > +       bool "seal system mappings"
> > +       default n
> > +       depends on 64BIT
> > +       depends on ARCH_HAS_SEAL_SYSTEM_MAPPINGS
> > +       depends on !CHECKPOINT_RESTORE
>
> Hi Jeff,
>
> I like the idea of this patchset, but I don’t like the idea of
> forcing users to choose between this security feature and
> checkpoint/restore functionality. We need to explore ways to make this
> feature work with checkpoint/restore. Relying on CAP_CHECKPOINT_RESTORE
> is the obvious approach.
>
I agree that forcing users to choose isn't ideal. I'd prefer a
solution where both approaches can be used in some way, depending on
the situation and distributions. Hopefully, with input from CRIU
developers, this can be achieved.

However, it makes sense to unconditionally seal vdso/vvar for systems
like ChromeOS and Android that don't currently use CRIU, so we will
need a KCONFIG for that.

> CRIU just needs to move these mappings, and it doesn't need to change
> their properties or modify their contents. With that in mind, here are
That is an important detail to know, thanks for bringing it up.

> two options:
> * Allow moving sealed mappings for processes with CAP_CHECKPOINT_RESTORE.
We could try to propose this under a new KCONFIG, e.g.
CONFIG_SEAL_SYSTEM_MAPPING_WITH_CAP_CHECK

IIUC, You propose allowing userspace mremap vdso if the process has
CAP_CHECKPOINT_RESTORE. However, I believe this approach raises
security concerns. During the RFC  for mseal, initially, I suggested
sealing mmap, mremap, and munmap individually, but Linus rejected this
proposal, for a good reason, mremap could leave an empty hole in the
address space, thus allowing the attacker to fill it with attacker
controlled content.

Furthermore, CAP_SYSTEM_ADMIN allows setting any capacity,  so this
would become a by-pass for sealing.

> * Allow temporarily "unsealing" mappings for processes with
>   CAP_CHECKPOINT_RESTORE. CRIU could unseal mappings, move them, and
>   then seal them back.
>
We could also try to propose this under CONFIG_SEAL_SYSTEM_MAPPING_FOR_CRIU.

It's important to note that temporarily unsealing a mapping from
userspace is not permitted. If a mapping has the capability to be
unsealed, it fundamentally does not provide the sealing property.

Perhaps the intention was for these steps to be carried out within the
kernel? e.g. the userspace could instruct the kernel to relocate the
vdso mapping. Since the kernel can ensure the vdso contents are not
manipulated by an attacker, this approach could offer a viable
solution.

I have been thinking of other alternatives, but those would require
more understanding on CRIU use cases.
One of my questions is: Would CRIU target an individual process? or
entire systems?

If it is an individual process, we could use prctl to opt-in/opt-out
certain processes. There could be two alternatives.
1> Opt-in solution: process must set prctl.seal_criu_mapping, this
needs to be set before execve() because sealing is applied at execve()
call.
2> opt-out solution: The system will by default seal all of the system
mappings, but individual processes can opt-out by setting
prctl.not_seal_criu_mappings. This also needs to be set before
execve() call.

For both cases, we will want to identify what type of mapping CRIU
cares about, i.e. maybe CRIU doesn't care about uprobe and vsyscall ?
and only care about vdso/vvar/sigpage ?

> Another approach might be to make this feature configurable on a
> per-process basis (e.g., via prctl). Once enabled for a process, it
> would be inherited by all its children. It can't be disabled unless a
> process has CAP_CHECKPOINT_RESTORE.
>
> I've added Mike, Dima, and Alex to the thread. They might have
> other ideas.
>
Thanks. Please feel free to chime in, I will also add Mike,Dima and
Alex to the new version of this series as well.

Thanks!
-Jeff



> Thanks,
> Andrei





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux