On 12/20/2019 7:41 AM, KP Singh wrote: > From: KP Singh <kpsingh@xxxxxxxxxx> > > This patch series is a continuation of the KRSI RFC > (https://lore.kernel.org/bpf/20190910115527.5235-1-kpsingh@xxxxxxxxxxxx/) > > # Motivation > > Google does rich analysis of runtime security data collected from > internal Linux deployments (corporate devices and servers) to detect and > thwart threats in real-time. Currently, this is done in custom kernel > modules but we would like to replace this with something that's upstream > and useful to others. > > The current kernel infrastructure for providing telemetry (Audit, Perf > etc.) is disjoint from access enforcement (i.e. LSMs). Augmenting the > information provided by audit requires kernel changes to audit, its > policy language and user-space components. Furthermore, building a MAC > policy based on the newly added telemetry data requires changes to > various LSMs and their respective policy languages. > > This patchset proposes a new stackable and privileged LSM which allows > the LSM hooks to be implemented using eBPF. This facilitates a unified > and dynamic (not requiring re-compilation of the kernel) audit and MAC > policy. > > # Why an LSM? > > Linux Security Modules target security behaviours rather than the > kernel's API. For example, it's easy to miss out a newly added system > call for executing processes (eg. execve, execveat etc.) but the LSM > framework ensures that all process executions trigger the relevant hooks > irrespective of how the process was executed. > > Allowing users to implement LSM hooks at runtime also benefits the LSM > eco-system by enabling a quick feedback loop from the security community > about the kind of behaviours that the LSM Framework should be targeting. > > # How does it work? > > The LSM introduces a new eBPF (https://docs.cilium.io/en/v1.6/bpf/) > program type, BPF_PROG_TYPE_LSM, which can only be attached to a LSM > hook. All LSM hooks are exposed as files in securityfs. Attachment > requires CAP_SYS_ADMIN for loading eBPF programs and CAP_MAC_ADMIN for > modifying MAC policies. > > The eBPF programs are passed the same arguments as the LSM hooks and > executed in the body of the hook. This effectively exposes the LSM hooks as external APIs. It would mean that we can't change or delete them. That would be bad. > If any of the eBPF programs returns an > error (like ENOPERM), the behaviour represented by the hook is denied. > > Audit logs can be written using a format chosen by the eBPF program to > the perf events buffer and can be further processed in user-space. > > # Limitations of RFC v1 > > In the previous design > (https://lore.kernel.org/bpf/20190910115527.5235-1-kpsingh@xxxxxxxxxxxx/), > the BPF programs received a context which could be queried to retrieve > specific pieces of information using specific helpers. > > For example, a program that attaches to the file_mprotect LSM hook and > queries the VMA region could have had the following context: > > // Special context for the hook. > struct bpf_mprotect_ctx { > struct vm_area_struct *vma; > }; > > and accessed the fields using a hypothetical helper > "bpf_mprotect_vma_get_start: > > SEC("lsm/file_mprotect") > int mprotect_audit(bpf_mprotect_ctx *ctx) > { > unsigned long vm_start = bpf_mprotect_vma_get_start(ctx); > return 0; > } > > or directly read them from the context by updating the verifier to allow > accessing the fields: > > int mprotect_audit(bpf_mprotect_ctx *ctx) > { > unsigned long vm_start = ctx->vma->vm_start; > return 0; > } > > As we prototyped policies based on this design, we realized that this > approach is not general enough. Adding helpers or verifier code for all > usages would imply a high maintenance cost while severely restricting > the instrumentation capabilities which is the key value add of our > eBPF-based LSM. > > Feedback from the BPF maintainers at Linux Plumbers also pushed us > towards the following, more general, approach. > > # BTF Based Design > > The current design uses BTF > (https://facebookmicrosites.github.io/bpf/blog/2018/11/14/btf-enhancement.html, > https://lwn.net/Articles/803258/) which allows verifiable read-only > structure accesses by field names rather than fixed offsets. This allows > accessing the hook parameters using a dynamically created context which > provides a certain degree of ABI stability: > > /* Clang builtin to handle field accesses. */ > #define _(P) (__builtin_preserve_access_index(P)) > > // Only declare the structure and fields intended to be used > // in the program > struct vm_area_struct { > unsigned long vm_start; > }; > > // Declare the eBPF program mprotect_audit which attaches to > // to the file_mprotect LSM hook and accepts three arguments. > BPF_TRACE_3("lsm/file_mprotect", mprotect_audit, > struct vm_area_struct *, vma, > unsigned long, reqprot, unsigned long, prot > { > unsigned long vm_start = _(vma->vm_start); > return 0; > } > > By relocating field offsets, BTF makes a large portion of kernel data > structures readily accessible across kernel versions without requiring a > large corpus of BPF helper functions and requiring recompilation with > every kernel version. The limitations of BTF compatibility are described > in BPF Co-Re (http://vger.kernel.org/bpfconf2019_talks/bpf-core.pdf, > i.e. field renames, #defines and changes to the signature of LSM hooks). > > This design imposes that the MAC policy (eBPF programs) be updated when > the inspected kernel structures change outside of BTF compatibility > guarantees. In practice, this is only required when a structure field > used by a current policy is removed (or renamed) or when the used LSM > hooks change. We expect the maintenance cost of these changes to be > acceptable as compared to the previous design > (https://lore.kernel.org/bpf/20190910115527.5235-1-kpsingh@xxxxxxxxxxxx/). > > # Distinction from Landlock > > We believe there exist two distinct use-cases with distinct set of users: > > * Unprivileged processes voluntarily relinquishing privileges with the > primary users being software developers. > > * Flexible privileged (CAP_MAC_ADMIN, CAP_SYS_ADMIN) MAC and Audit with > the primary users being system policy admins. > > These use-cases imply different APIs and trade-offs: > > * The unprivileged use case requires defining more stable and custom APIs > (through opaque contexts and precise helpers). > > * Privileged Audit and MAC requires deeper introspection of the kernel > data structures to maximise the flexibility that can be achieved without > kernel modification. > > Landlock has demonstrated filesystem sandboxes and now Ptrace access > control in its patches which are excellent use cases for an unprivileged > process voluntarily relinquishing privileges. > > However, Landlock has expanded its original goal, "towards unprivileged > sandboxing", to being a "low-level framework to build > access-control/audit systems" (https://landlock.io). We feel that the > design and implementation are still driven by the constraints and > trade-offs of the former use-case, and do not provide a satisfactory > solution to the latter. > > We also believe that our approach, direct access to common kernel data > structures as with BTF, is inappropriate for unprivileged processes and > probably not a good option for Landlock. > > In conclusion, we feel that the design for a privileged LSM and > unprivileged LSM are mutually exclusive and that one cannot be built > "on-top-of" the other. Doing so would limit the capabilities of what can > be done for an LSM that provides flexible audit and MAC capabilities or > provide in-appropriate access to kernel internals to an unprivileged > process. > > Furthermore, the Landlock design supports its historical use-case only > when unprivileged eBPF is allowed. This is something that warrants > discussion before an unprivileged LSM that uses eBPF is upstreamed. > > # Why not tracepoints or kprobes? > > In order to do MAC with tracepoints or kprobes, we would need to > override the return value of the security hook. This is not possible > with tracepoints or call-site kprobes. > > Attaching to the return boundary (kretprobe) implies that BPF programs > would always get called after all the other LSM hooks are called and > clobber the pre-existing LSM semantics. > > Enforcing MAC policy with an actual LSM helps leverage the verified > semantics of the framework. > > # Usage Examples > > A simple example and some documentation is included in the patchset. > > In order to better illustrate the capabilities of the framework some > more advanced prototype code has also been published separately: > > * Logging execution events (including environment variables and arguments): > https://github.com/sinkap/linux-krsi/blob/patch/v1/examples/samples/bpf/lsm_audit_env.c > * Detecting deletion of running executables: > https://github.com/sinkap/linux-krsi/blob/patch/v1/examples/samples/bpf/lsm_detect_exec_unlink.c > * Detection of writes to /proc/<pid>/mem: > https://github.com/sinkap/linux-krsi/blob/patch/v1/examples/samples/bpf/lsm_audit_env.c > > We have updated Google's internal telemetry infrastructure and have > started deploying this LSM on our Linux Workstations. This gives us more > confidence in the real-world applications of such a system. > > KP Singh (13): > bpf: Refactor BPF_EVENT context macros to its own header. > bpf: lsm: Add a skeleton and config options > bpf: lsm: Introduce types for eBPF based LSM > bpf: lsm: Allow btf_id based attachment for LSM hooks > tools/libbpf: Add support in libbpf for BPF_PROG_TYPE_LSM > bpf: lsm: Init Hooks and create files in securityfs > bpf: lsm: Implement attach, detach and execution. > bpf: lsm: Show attached program names in hook read handler. > bpf: lsm: Add a helper function bpf_lsm_event_output > bpf: lsm: Handle attachment of the same program > tools/libbpf: Add bpf_program__attach_lsm > bpf: lsm: Add selftests for BPF_PROG_TYPE_LSM > bpf: lsm: Add Documentation > > Documentation/security/bpf.rst | 164 +++ > Documentation/security/index.rst | 1 + > MAINTAINERS | 11 + > include/linux/bpf_event.h | 78 ++ > include/linux/bpf_lsm.h | 25 + > include/linux/bpf_types.h | 4 + > include/trace/bpf_probe.h | 30 +- > include/uapi/linux/bpf.h | 12 +- > kernel/bpf/syscall.c | 10 + > kernel/bpf/verifier.c | 84 +- > kernel/trace/bpf_trace.c | 24 +- > security/Kconfig | 11 +- > security/Makefile | 2 + > security/bpf/Kconfig | 25 + > security/bpf/Makefile | 7 + > security/bpf/include/bpf_lsm.h | 63 + > security/bpf/include/fs.h | 23 + > security/bpf/include/hooks.h | 1015 +++++++++++++++++ > security/bpf/lsm.c | 160 +++ > security/bpf/lsm_fs.c | 176 +++ > security/bpf/ops.c | 224 ++++ > tools/include/uapi/linux/bpf.h | 12 +- > tools/lib/bpf/bpf.c | 2 +- > tools/lib/bpf/bpf.h | 6 + > tools/lib/bpf/libbpf.c | 163 ++- > tools/lib/bpf/libbpf.h | 4 + > tools/lib/bpf/libbpf.map | 7 + > tools/lib/bpf/libbpf_probes.c | 1 + > .../bpf/prog_tests/lsm_mprotect_audit.c | 129 +++ > .../selftests/bpf/progs/lsm_mprotect_audit.c | 58 + > 30 files changed, 2451 insertions(+), 80 deletions(-) > create mode 100644 Documentation/security/bpf.rst > create mode 100644 include/linux/bpf_event.h > create mode 100644 include/linux/bpf_lsm.h > create mode 100644 security/bpf/Kconfig > create mode 100644 security/bpf/Makefile > create mode 100644 security/bpf/include/bpf_lsm.h > create mode 100644 security/bpf/include/fs.h > create mode 100644 security/bpf/include/hooks.h > create mode 100644 security/bpf/lsm.c > create mode 100644 security/bpf/lsm_fs.c > create mode 100644 security/bpf/ops.c > create mode 100644 tools/testing/selftests/bpf/prog_tests/lsm_mprotect_audit.c > create mode 100644 tools/testing/selftests/bpf/progs/lsm_mprotect_audit.c >