On Sat, Dec 12, 2020 at 9:14 PM <stefan.bavendiek@xxxxxxxxxxx> wrote: > Personally I am interested in Linux Kernel Security and especially features supporting attack surface reduction. In the past I did some work on sandboxing features like seccomp support in user space applications. I have been rather hesitant to get involved here, since I am not a full time developer and certainly not an expert in C programming. (By the way, one interesting area where upstream development is currently happening that's related to userspace sandboxing is the Landlock patchset by Mickaël Salaün, which adds an API that allows unprivileged processes to restrict their filesystem access without having to mess around with stuff like mount namespaces and broker processes; the latest version is at <https://lore.kernel.org/kernel-hardening/20201209192839.1396820-1-mic@xxxxxxxxxxx/>. That might be relevant to your interests.) > However I am currently doing a research project that aims to identify risk areas in the kernel by measuring code complexity metrics and assuming this might help this project, I would like to ask for some feedback in case this work can actually help with this project. > > My approach is basically to take a look at the different system calls and measure the complexity of the code involved in their execution. Since code complexity has already been found to have a strong correlation with the probability of existing vulnerabilities, this might indicate kernel areas that need a closer look. Keep in mind that while system calls are one of the main entry points from userspace into the kernel, and the main way in which userspace can trigger kernel bugs, syscalls do not necessarily closely correspond to specific kernel subsystems. For example, system calls like read() and write() can take a gigantic number of execution paths because, especially when you take files in /proc and /sys into consideration, they interact with things all over the place across the kernel. For example, write() can modify page tables of other processes, can trigger page allocation and reclaim, can modify networking configuration, can interact with filesystems and block devices and networking and user namespace configuration and pipes, and so on. But the areas that are reachable through this syscall depend on other ways in which the process is limited - in particular, what kinds of files it can open. Also keep in mind that even a simple syscall like getresuid() can, through the page fault handling code, end up in subsystems related to filesystems, block devices, networking, graphics and so on - so you'd probably have to exclude any control flows that go through certain pieces of core kernel infrastructure. > Additionally the functionality of the syscall will also be considered for a final risk score, although most of the work for this part has already been done in [1]. That's a paper from 2002 that talks about "UNIX system calls", and categorizes syscalls like init_module as being of the highest "threat level" even though that syscall does absolutely nothing unless you're already root. It also has "denial of service attacks" as the second-highest "threat level classification", which I don't think makes any sense - I don't think that current OS kernels are designed to prevent an attacker with the ability to execute arbitrary syscalls from userspace from slowing the system down. Fundamentally it looks to me as if it classifies syscalls by the risk caused if you let an attacker run arbitrary code in userspace **with root privileges**, which seems to me like an extremely silly threat model. > The objective is to create a risk score matrix for linux syscalls that consists of the functionality risk according to [1], times the measured complexity. I don't understand why you would multiply functionality risk and complexity. They're probably more additive than multiplicative, since in a per-subsystem view, risk caused by functionality and complexity of the implementation are often completely separate. For example, the userfaultfd subsystem introduces functionality risk by allowing attackers to arbitrarily pause the kernel at any copy_from_user() call, but that doesn't combine with the complexity of the userfaultfd subsystem, but with the complexity of all copy_from_user() callers everywhere across the kernel. > This will (hopefully) be helpful to identify risk areas in the kernel and provide user space developers with an measurement that can help design secure software and sandboxing features. I'm not sure whether this would really be all that helpful for userspace sandboxing decisions - as far as I know, userspace normally isn't in a position where it can really choose which syscalls it wants to use, but instead the choice of syscalls to use is driven by the requirements that userspace has. If you tell userspace that write() can hit tons of kernel code, it's not like userspace can just stop using write(); and if you then also tell userspace that pwrite() can also hit a lot of kernel code, that may be misinterpreted as meaning that pwrite() adds lots of risk while actually, write() and pwrite() reach (almost) the same areas of code. Also, the areas of code that a syscall like write() can hit depend hugely on file system access policies. I also don't think that doing something like this on a per-syscall basis would be very beneficial for informing something like priorities for auditing kernel code; only a small chunk of the kernel even has its own syscalls, while most of it receives commands through more-or-less generic syscalls that are then plumbed through. > One major aspect I am still not sure about is the challenges regarding the dynamic measure of code path execution. While it is possible to measure the cyclomatic complexity of the kernel code with existing tools, I am not sure how much value the results would have, given that this does not include the dynamic code path behind each syscall. I was thinking of using ftrace to follow and measure the execution path. Any feedback and advise on this for this would be appreciated.