Add a new system state document to the admin-guide. This document is intended to be used as a guide on how to gather higher level information about a system and its run-time activity. Signed-off-by: Shuah Khan <skhan@xxxxxxxxxxxxxxxxxxx> --- Changes since v1: -- Addressed review comments Documentation/admin-guide/index.rst | 1 + Documentation/admin-guide/system-state.rst | 350 +++++++++++++++++++++ 2 files changed, 351 insertions(+) create mode 100644 Documentation/admin-guide/system-state.rst diff --git a/Documentation/admin-guide/index.rst b/Documentation/admin-guide/index.rst index f475554382e2..541372672c55 100644 --- a/Documentation/admin-guide/index.rst +++ b/Documentation/admin-guide/index.rst @@ -66,6 +66,7 @@ subsystems expectations will be found here. :maxdepth: 1 workload-tracing + system-state The rest of this manual consists of various unordered guides on how to configure specific aspects of kernel behavior to your liking. diff --git a/Documentation/admin-guide/system-state.rst b/Documentation/admin-guide/system-state.rst new file mode 100644 index 000000000000..2a6fdf85c35c --- /dev/null +++ b/Documentation/admin-guide/system-state.rst @@ -0,0 +1,350 @@ +.. SPDX-License-Identifier: (GPL-2.0+ OR CC-BY-4.0) + +=========================================================== +Discovering system calls and features supported on a system +=========================================================== + +:Author: Shuah Khan <skhan@xxxxxxxxxxxxxxxxxxx> +:maintained-by: Shuah Khan <skhan@xxxxxxxxxxxxxxxxxxx> + +Key Points +========== + + * System state includes system calls, features, static and dynamic + modules enabled in the kernel configuration. + * Supported system calls and Kernel features are architecture dependent. + * auditd, checksyscalls.sh, and get_feat.pl tools can be used to discover + static system state. + * Understanding Linux kernel hardening configurations options and making + sure they are enabled will make a system more secure. + * Employing run-time tracing can shed light on the dynamic system state. + * Workloads could change the system state by loading and unloading dynamic + modules and tuning system parameters. + +System State Visualization +========================== + +The kernel system state can be viewed as a combination of static and +dynamic features and modules. Let’s first define what static and dynamic +system states are and then explore how we can visualize the static and +dynamic system parts of the kernel. + +Static System View comprises system calls, features, static and dynamic +modules enabled in the kernel configuration. Supported system calls +and Kernel features are architecture dependent. System call numbering is +different on different architectures. We can get the supported system call +information using auditd utilities. + +ausyscall –dump prints out the supported system calls on a system and allows +mapping syscall names and numbers. You can install the auditd package on +Debian based systems:: + + sudo apt-get install auditd + +scripts/checksyscalls.sh can be used to check if current architecture is +missing any system calls compared to i386. + +scripts/get_feat.pl can be used to list the Kernel feature support matrix +for an architecture. + +Dynamic System View comprises system calls, ioctls invoked, and subsystems +used during the runtime. A workload could load and unload modules and also +change the dynamic system configuration to suit its needs by tuning system +parameters. + +What is the methodology? +======================== + +The first step is gathering the default system state such as the dynamic +and static modules loaded on the system. lsmod command prints out the +dynamically loaded modules on a system. Statically configured modules can +be found in the kernel configuration file. + +The next step is discovering system activity during run-time. You can do so +by enabling event tracing and then running your favorite application. After +a period of time, gather the event logs, and kernel messages. + +Once you have the necessary information, you can extract the system call +numbers from the event trace log and map them to the supported system calls. + +Finding supported system calls +============================== + +As mentioned earlier, ausyscall prints out supported system calls +on a system and allows mapping syscalls names and numbers:: + + ausyscall --dump + +You can look for specific system calls as shown in the below:: + + ausyscall open + open 2 + mq_open 240 + openat 257 + perf_event_open 298 + open_by_handle_at 304 + open_tree 428 + fsopen 430 + pidfd_open 434 + openat2 437 + + ausyscall time + + getitimer 36 + setitimer 38 + gettimeofday 96 + times 100 + rt_sigtimedwait 128 + utime 132 + adjtimex 159 + settimeofday 164 + time 201 + semtimedop 220 + timer_create 222 + timer_settime 223 + timer_gettime 224 + timer_getoverrun 225 + timer_delete 226 + clock_settime 227 + clock_gettime 228 + utimes 235 + mq_timedsend 242 + mq_timedreceive 243 + futimesat 261 + utimensat 280 + timerfd_create 283 + timerfd_settime 286 + timerfd_gettime 287 + clock_adjtime 305 + +Finding unsupported system calls +================================ + +As mentioned earlier, scripts/checksyscalls.sh checks missing system calls +on current architecture compared to i386. Example run:: + + checksyscalls.sh gcc + warning: #warning syscall mmap2 not implemented [-Wcpp] + warning: #warning syscall truncate64 not implemented [-Wcpp] + warning: #warning syscall ftruncate64 not implemented [-Wcpp] + warning: #warning syscall fcntl64 not implemented [-Wcpp] + warning: #warning syscall sendfile64 not implemented [-Wcpp] + warning: #warning syscall statfs64 not implemented [-Wcpp] + warning: #warning syscall fstatfs64 not implemented [-Wcpp] + warning: #warning syscall fadvise64_64 not implemented [-Wcpp] + +Let's check this against ausyscall now:: + + ausyscall map + mmap 9 + munmap 11 + mremap 25 + remap_file_pages 216 + + ausyscall trunc + truncate 76 + ftruncate 77 + +As you can see, ausyscall shows mmap2, truncate64, and ftruncate64 aren't +implemented on this system. This matches what checksyscalls.sh shows. + +Finding supported features +========================== + +scripts/get_feat.pl can be used to list the Kernel feature support matrix +for an architecture:: + + get_feat.pl list + get_feat.pl list –arch=arm64 lists + +This scripts parses Documentation/features to find the support status +information. It can be used to validate the contents of the files under +Documentation/features or simply list them:: + + --arch Outputs features for an specific architecture, optionally filtering + for a single specific feature. + --feat or --feature Output features for a single specific feature. + +Here is how you can find if stackprotector and hread-info-in-task features +are supported:: + + scripts/get_feat.pl --arch=arm64 --feat=stackprotector list + # + # Kernel feature support matrix of the 'arm64' architecture: + # + debug/ stackprotector : ok | HAVE_STACKPROTECTOR # + arch supports compiler driven stack overflow protection + + scripts/get_feat.pl --feat=thread-info-in-task list + # + # Kernel feature support matrix of the 'x86' architecture: + # + core/ thread-info-in-task : ok | THREAD_INFO_IN_TASK # + arch makes use of the core kernel facility to embed thread_info in + task_struct + +Finding kernel module status +============================ + +lsmod command shows the kernel modules that are currently loaded. This +program displays the contents of /proc/modules. Let's pick uvcvideo +module which is found on most laptops:: + + lsmod | grep uvc + uvcvideo 126976 0 + videobuf2_vmalloc 20480 1 uvcvideo + uvc 16384 1 uvcvideo + videobuf2_v4l2 36864 1 uvcvideo + videodev 315392 2 videobuf2_v4l2,uvcvideo + videobuf2_common 65536 4 videobuf2_vmalloc,videobuf2_v4l2,uvcvideo,videobuf2_memops + mc 77824 4 videodev,videobuf2_v4l2,uvcvideo,videobuf2_common + +You can see that lsmod shows uvcvideo and the modules it depends on and how +many modules are using them. videobuf2_common is in use by 4 other modules. +In other words, this is the reference count for this module and rmmod will +refuse to unload it as long as the reference count is > 0. + +You can get the same information from /proc.modules:: + + less /proc/modules | grep uvc + uvcvideo 126976 0 - Live 0x0000000000000000 + videobuf2_vmalloc 20480 1 uvcvideo, Live 0x0000000000000000 + uvc 16384 1 uvcvideo, Live 0x0000000000000000 + videobuf2_v4l2 36864 1 uvcvideo, Live 0x0000000000000000 + videodev 315392 2 uvcvideo,videobuf2_v4l2, Live 0x0000000000000000 + videobuf2_common 65536 4 uvcvideo,videobuf2_vmalloc,videobuf2_memops,videobuf2_v4l2, Live 0x0000000000000000 + mc 77824 4 uvcvideo,videobuf2_v4l2,videodev,videobuf2_common, Live 0x0000000000000000 + +The information is similar with a few more extra fields. The address is the +base address for the module in kernel virtual memory space. When run as a +normal user, the address is all zeros. The same command when run as root will +be as follows:: + + sudo less /proc/modules | grep uvc + uvcvideo 126976 0 - Live 0xffffffffc1c8b000 + videobuf2_vmalloc 20480 1 uvcvideo, Live 0xffffffffc167f000 + uvc 16384 1 uvcvideo, Live 0xffffffffc0ab0000 + videobuf2_v4l2 36864 1 uvcvideo, Live 0xffffffffc0a28000 + videodev 315392 2 uvcvideo,videobuf2_v4l2, Live 0xffffffffc16e9000 + videobuf2_common 65536 4 uvcvideo,videobuf2_vmalloc,videobuf2_memops,videobuf2_v4l2, Live 0xffffffffc094d000 + mc 77824 4 uvcvideo,videobuf2_v4l2,videodev,videobuf2_common, Live 0xffffffffc15eb000 + +Let's check what modinfo shows that is important for us:: + + /sbin/modinfo uvcvideo + filename: /lib/modules/6.3.0-rc2/kernel/drivers/media/usb/uvc/uvcvideo.ko + license: GPL + description: USB Video Class driver + depends: videobuf2-v4l2,videodev,mc,uvc,videobuf2-common,videobuf2-vmalloc + retpoline: Y + intree: Y + name: uvcvideo + vermagic: 6.3.0-rc2 SMP preempt mod_unload modversions + sig_id: PKCS#7 + signer: Build time autogenerated kernel key + +This tells us that this module is built intree and the signed with a build +time autogenerated key. + +Let's do one last sanity check on the system to see if the following two +command outputs match:: + + ps ax | wc -l + ls -d /proc/* | grep [0-9]|wc -l + +If they don't match, examine your system closely. kernel rootkits install +their own ps, find, etc. utilities to mask their activity. The outputs +match on my system. Do they on yours? + +Is my system as secure as it could be? +====================================== + +Linux kernel supports several hardening options to make system secure. +kconfig-hardened-check tool sanity checks kernel configuration for +security. You can clone the latest kconfig-hardened-check repository:: + + git clone https://github.com/a13xp0p0v/kconfig-hardened-check.git + cd kconfig-hardened-check + bin/kconfig-hardened-check --config <config file> --cmdline /proc/cmdline + +This will generate detailed report of kernel security configuration and +command line options that are enabled (OK) and the ones that aren't (FAIL) +and a summary line at the end:: + + [+] Config check is finished: 'OK' - 100 / 'FAIL' - 100 + +You will have to analyze the information to determine which options make +sense to enable on your system. + +Understanding system run-time activity +====================================== + +Enabling event tracing gives insight into system run-time activity. This is +a good way to identify which parts of the kernel are used at a higher level +while system is in and/or while a specific workload/process is running. + +Event tracing depends on the CONFIG_EVENT_TRACING option enabled. You can +enable event tracing before starting workload/process. Event tracing allows +you to dynamically enable and disable tracing on supported/available events. +You can find available events, tracers, and filter functions in the following +files:: + + /sys/kernel/debug/tracing/available_events + /sys/kernel/debug/tracing/available_filter_functions + /sys/kernel/debug/tracing/available_tracers + +Now this is how you can enable tracing:: + + sudo echo 1 > /sys/kernel/debug/tracing/events/enable + +Once the workload/process stops or when you decide you have the status you +need, you can disable event tracing:: + + sudo echo 0 > /sys/kernel/debug/tracing/events/enable + +You can find the tracing information in the file:: + + /sys/kernel/debug/tracing + +Here is the information shown in this file:: + + cat trace + # tracer: nop + # + # entries-in-buffer/entries-written: 0/0 #P:16 + # + # _-----=> irqs-off/BH-disabled + # / _----=> need-resched + # | / _---=> hardirq/softirq + # || / _--=> preempt-depth + # ||| / _-=> migrate-disable + # |||| / delay + # TASK-PID CPU# ||||| TIMESTAMP FUNCTION + # | | | ||||| | | + + +Analyzing traces +================ + +You will be able map the functions to system calls and other kernel features +to get insight into the overall system activity while a workload/process is +running. + +Map the NR (syscal) numbers from the trace to syscalls from the syscalls dump. +Categorize system calls and map them to Linux subsystems. + +Conclusion +========== + +This document is intended to be used as a guide on how to gather higher level +information about a system and its run-time activity. The approach described +in this document helps us get insight into supported system calls, features, +assess how secure a system is, and its run-time activity. + +References +========== + + * https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/scripts/checksyscalls.sh + * https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/scripts/get_feat.pl + * https://github.com/a13xp0p0v/kconfig-hardened-check + * https://docs.kernel.org/trace/index.html -- 2.34.1