From: Petr Tesarik <petr.tesarik1@xxxxxxxxxxxxxxxxxxx> Add a SandBox Mode document under Documentation/security. Describe the concept, usage and known limitations. Signed-off-by: Petr Tesarik <petr.tesarik1@xxxxxxxxxxxxxxxxxxx> --- Documentation/security/index.rst | 1 + Documentation/security/sandbox-mode.rst | 180 ++++++++++++++++++++++++ 2 files changed, 181 insertions(+) create mode 100644 Documentation/security/sandbox-mode.rst diff --git a/Documentation/security/index.rst b/Documentation/security/index.rst index 59f8fc106cb0..680a0b8bf28b 100644 --- a/Documentation/security/index.rst +++ b/Documentation/security/index.rst @@ -14,6 +14,7 @@ Security Documentation sak SCTP self-protection + sandbox-mode siphash tpm/index digsig diff --git a/Documentation/security/sandbox-mode.rst b/Documentation/security/sandbox-mode.rst new file mode 100644 index 000000000000..4405b8858c4a --- /dev/null +++ b/Documentation/security/sandbox-mode.rst @@ -0,0 +1,180 @@ +.. SPDX-License-Identifier: GPL-2.0 + +============ +SandBox Mode +============ + +Introduction +============ + +The primary goal of SandBox Mode (SBM) is to reduce the impact of potential +memory safety bugs in kernel code by decomposing the kernel. The SBM API +allows to run each component inside an isolated execution environment. In +particular, memory areas used as input and/or output are isolated from the +rest of the kernel and surrounded by guard pages. Without arch hooks, this +common base provides *weak isolation*. + +On architectures which implement the necessary arch hooks, SandBox Mode +leverages hardware paging facilities and CPU privilege levels to enforce the +use of only these predefined memory areas. With arch support, SBM can also +recover from protection violations. This means that SBM forcibly terminates +the sandbox and returns an error code (e.g. ``-EFAULT``) to the caller, so +execution can continue. Such implementation provides *strong isolation*. + +A target function in a sandbox communicates with the rest of the kernel +through a caller-defined interface, comprising read-only buffers (input), +read-write buffers (output) and the return value. The caller can explicitly +share other data with the sandbox, but doing so may reduce isolation strength. + +Protection of sensitive kernel data is currently out of scope. SandBox Mode is +meant to run kernel code which would otherwise have full access to all system +resources. SBM allows to impose a scoped access control policy on which +resources are available to the sandbox. That said, protection of sensitive +data is foreseen as a future goal, and that's why the API is designed to +control not only memory writes but also memory reads. + +The expected use case for SandBox Mode is parsing data from untrusted sources, +especially if the parsing cannot be reasonably done by a user mode helper. +Keep in mind that a sandbox doesn't guarantee that the output data is correct. +The result may be corrupt (e.g. as a result of an exploited bug) and where +applicable, it should be sanitized before further use. + +Using SandBox Mode +================== + +SandBox Mode is an optional feature, enabled with ``CONFIG_SANDBOX_MODE``. +However, the SBM API is always defined regardless of the kernel configuration. +It will call a function with the best available isolation, which is: + +* *strong isolation* if both ``CONFIG_SANDBOX_MODE`` and + ``CONFIG_ARCH_HAVE_SBM`` are set, +* *weak isolation* if ``CONFIG_SANDBOX_MODE`` is set, but + ``CONFIG_ARCH_HAVE_SBM`` is unset, +* *no isolation* if ``CONFIG_SANDBOX_MODE`` is unset. + +Code which cannot safely run with no isolation should depend on the relevant +config option(s). + +The API can be used like this: + +.. code-block:: c + + #include <linux/sbm.h> + + /* Function to be executed in a sandbox. */ + static SBM_DEFINE_FUNC(my_func, const struct my_input *, in, + struct my_output *, out) + { + /* Read from in, write to out. */ + return 0; + } + + int caller(...) + { + /* Declare a SBM instance. */ + struct sbm sbm; + + /* Initialize SBM instance. */ + sbm_init(&sbm); + + /* Execute my_func() using the SBM instance. */ + err = sbm_call(&sbm, my_func, + SBM_COPY_IN(&sbm, input, in_size), + SBM_COPY_OUT(&sbm, output, out_size)); + + /* Clean up. */ + sbm_destroy(&sbm); + +The return type of a sandbox mode function is always ``int``. The return value +is zero on success and negative on error. That's because the SBM helpers +return an error code (such as ``-ENOMEM``) if the call cannot be performed. + +If sbm_call() returns an error, you can use sbm_error() to decide whether the +error was returned by the target function or because sandbox mode was aborted +(or failed to run entirely). + +Public API +---------- + +.. kernel-doc:: include/linux/sbm.h + :identifiers: sbm sbm_init sbm_destroy sbm_exec sbm_error + SBM_COPY_IN SBM_COPY_OUT SBM_COPY_INOUT + SBM_DEFINE_CALL SBM_DEFINE_THUNK SBM_DEFINE_FUNC + sbm_call + +Arch Hooks +---------- + +These hooks must be implemented to select HAVE_ARCH_SBM. + +.. kernel-doc:: include/linux/sbm.h + :identifiers: arch_sbm_init arch_sbm_destroy arch_sbm_exec + arch_sbm_map_readonly arch_sbm_map_writable + +Current Limitations +=================== + +This section lists know limitations of the current SBM implementation, which +are planned to be removed in the future. + +Stack +----- + +There is no generic kernel API to run a function on an alternate stack, so SBM +runs on the normal kernel stack by default. The kernel already offers +self-protection against stack overflows and underflows as well as against +overwriting on-stack data outside the current frame, but violations are +usually fatal. + +This limitation can be solved for specific targets. Arch hooks can set up a +separate stack and recover from stack frame overruns. + +Inherent Limitations +==================== + +This section lists limitations which are inherent to the concept. + +Explicit Code +------------- + +The main idea behind SandBox Mode is decomposition of one big program (the +Linux kernel) into multiple smaller programs that can be sandboxed. AFAIK +there is no way to automate this task for an existing code base in C. + +Given the performance impact of running code in a sandbox, this limitation may +be perceived as a benefit. It is expected that sandbox mode is introduced only +knowingly and only where safety is more important than performance. + +Complex Data +------------ + +Although data structures are not serialized and deserialized between kernel +mode and sandbox mode, all directly and indirectly referenced data structures +must be explicitly mapped into the sandbox, which requires some manual effort. + +Copying of input/output buffers also incurs some runtime overhead. This +overhead can be reduced by sharing data directly with the sandbox, but the +resulting isolation is weaker, so it may or may not be acceptable, depending +on the overall safety requirements. + +Page Granularity +---------------- + +Since paging is used to enforce memory safety, page size is the smallest unit. +Objects mapped into the sandbox must be aligned to a page boundary, and buffer +overflows may not be detected if they fit into the same page. + +On the other hand, even though such writes are not detected, they do not +corrupt kernel data, because only the output buffer is copied back to kernel +mode, and the (corrupted) rest of the page is ignored. + +Transitions +----------- + +Transitions between kernel mode and sandbox mode are synchronous. That is, +whenever entering or leaving sandbox mode, the currently running CPU executes +the instructions necessary to save/restore its kernel-mode state. The API is +generic enough to allow asynchronous transitions, e.g. to pass data to another +CPU which is already running in sandbox mode. However, to see the benefits, a +hypothetical implementation would require far-reaching changes in the kernel +scheduler. This is (currently) out of scope. -- 2.34.1