Re: Kernel complexity

Jann Horn <jannh@xxxxxxxxxx> · Sat, 12 Dec 2020 23:34:12 +0100

On Sat, Dec 12, 2020 at 9:14 PM <stefan.bavendiek@xxxxxxxxxxx> wrote:
> Personally I am interested in Linux Kernel Security and especially features supporting attack surface reduction. In the past I did some work on sandboxing features like seccomp support in user space applications. I have been rather hesitant to get involved here, since I am not a full time developer and certainly not an expert in C programming.

(By the way, one interesting area where upstream development is
currently happening that's related to userspace sandboxing is the
Landlock patchset by Mickaël Salaün, which adds an API that allows
unprivileged processes to restrict their filesystem access without
having to mess around with stuff like mount namespaces and broker
processes; the latest version is at
<https://lore.kernel.org/kernel-hardening/20201209192839.1396820-1-mic@xxxxxxxxxxx/>.
That might be relevant to your interests.)

> However I am currently doing a research project that aims to identify risk areas in the kernel by measuring code complexity metrics and assuming this might help this project, I would like to ask for some feedback in case this work can actually help with this project.
>
> My approach is basically to take a look at the different system calls and measure the complexity of the code involved in their execution. Since code complexity has already been found to have a strong correlation with the probability of existing vulnerabilities, this might indicate kernel areas that need a closer look.

Keep in mind that while system calls are one of the main entry points
from userspace into the kernel, and the main way in which userspace
can trigger kernel bugs, syscalls do not necessarily closely
correspond to specific kernel subsystems.

For example, system calls like read() and write() can take a gigantic
number of execution paths because, especially when you take files in
/proc and /sys into consideration, they interact with things all over
the place across the kernel. For example, write() can modify page
tables of other processes, can trigger page allocation and reclaim,
can modify networking configuration, can interact with filesystems and
block devices and networking and user namespace configuration and
pipes, and so on. But the areas that are reachable through this
syscall depend on other ways in which the process is limited - in
particular, what kinds of files it can open.

Also keep in mind that even a simple syscall like getresuid() can,
through the page fault handling code, end up in subsystems related to
filesystems, block devices, networking, graphics and so on - so you'd
probably have to exclude any control flows that go through certain
pieces of core kernel infrastructure.

> Additionally the functionality of the syscall will also be considered for a final risk score, although most of the work for this part has already been done in [1].

That's a paper from 2002 that talks about "UNIX system calls", and
categorizes syscalls like init_module as being of the highest "threat
level" even though that syscall does absolutely nothing unless you're
already root. It also has "denial of service attacks" as the
second-highest "threat level classification", which I don't think
makes any sense - I don't think that current OS kernels are designed
to prevent an attacker with the ability to execute arbitrary syscalls
from userspace from slowing the system down. Fundamentally it looks to
me as if it classifies syscalls by the risk caused if you let an
attacker run arbitrary code in userspace **with root privileges**,
which seems to me like an extremely silly threat model.

> The objective is to create a risk score matrix for linux syscalls that consists of the functionality risk according to [1], times the measured complexity.

I don't understand why you would multiply functionality risk and
complexity. They're probably more additive than multiplicative, since
in a per-subsystem view, risk caused by functionality and complexity
of the implementation are often completely separate. For example, the
userfaultfd subsystem introduces functionality risk by allowing
attackers to arbitrarily pause the kernel at any copy_from_user()
call, but that doesn't combine with the complexity of the userfaultfd
subsystem, but with the complexity of all copy_from_user() callers
everywhere across the kernel.

> This will (hopefully) be helpful to identify risk areas in the kernel and provide user space developers with an measurement that can help design secure software and sandboxing features.

I'm not sure whether this would really be all that helpful for
userspace sandboxing decisions - as far as I know, userspace normally
isn't in a position where it can really choose which syscalls it wants
to use, but instead the choice of syscalls to use is driven by the
requirements that userspace has. If you tell userspace that write()
can hit tons of kernel code, it's not like userspace can just stop
using write(); and if you then also tell userspace that pwrite() can
also hit a lot of kernel code, that may be misinterpreted as meaning
that pwrite() adds lots of risk while actually, write() and pwrite()
reach (almost) the same areas of code. Also, the areas of code that a
syscall like write() can hit depend hugely on file system access
policies.

I also don't think that doing something like this on a per-syscall
basis would be very beneficial for informing something like priorities
for auditing kernel code; only a small chunk of the kernel even has
its own syscalls, while most of it receives commands through
more-or-less generic syscalls that are then plumbed through.

> One major aspect I am still not sure about is the challenges regarding the dynamic measure of code path execution. While it is possible to measure the cyclomatic complexity of the kernel code with existing tools, I am not sure how much value the results would have, given that this does not include the dynamic code path behind each syscall. I was thinking of using ftrace to follow and measure the execution path. Any feedback and advise on this for this would be appreciated.