Re: [PATCH v2] docs: add system-state document to admin-guide

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Shuah Khan <skhan@xxxxxxxxxxxxxxxxxxx> writes:

> Add a new system state document to the admin-guide. This document is
> intended to be used as a guide on how to gather higher level information
> about a system and its run-time activity.
>
> Signed-off-by: Shuah Khan <skhan@xxxxxxxxxxxxxxxxxxx>
> ---
> Changes since v1:
> -- Addressed review comments
>
>  Documentation/admin-guide/index.rst        |   1 +
>  Documentation/admin-guide/system-state.rst | 350 +++++++++++++++++++++
>  2 files changed, 351 insertions(+)
>  create mode 100644 Documentation/admin-guide/system-state.rst
>
> diff --git a/Documentation/admin-guide/index.rst b/Documentation/admin-guide/index.rst
> index f475554382e2..541372672c55 100644
> --- a/Documentation/admin-guide/index.rst
> +++ b/Documentation/admin-guide/index.rst
> @@ -66,6 +66,7 @@ subsystems expectations will be found here.
>     :maxdepth: 1
>  
>     workload-tracing
> +   system-state
>  
>  The rest of this manual consists of various unordered guides on how to
>  configure specific aspects of kernel behavior to your liking.
> diff --git a/Documentation/admin-guide/system-state.rst b/Documentation/admin-guide/system-state.rst
> new file mode 100644
> index 000000000000..2a6fdf85c35c
> --- /dev/null
> +++ b/Documentation/admin-guide/system-state.rst
> @@ -0,0 +1,350 @@
> +.. SPDX-License-Identifier: (GPL-2.0+ OR CC-BY-4.0)
> +
> +===========================================================
> +Discovering system calls and features supported on a system
> +===========================================================
> +
> +:Author: Shuah Khan <skhan@xxxxxxxxxxxxxxxxxxx>
> +:maintained-by: Shuah Khan <skhan@xxxxxxxxxxxxxxxxxxx>

Rather than adding lines like this, I think everybody would be better
served with a MAINTAINERS file entry.  get_maintainer.pl doesn't know
about these lines.

> +Key Points
> +==========
> +
> + * System state includes system calls, features, static and dynamic
> +   modules enabled in the kernel configuration.
> + * Supported system calls and Kernel features are architecture dependent.
> + * auditd, checksyscalls.sh, and get_feat.pl tools can be used to discover
> +   static system state.
> + * Understanding Linux kernel hardening configurations options and making
> +   sure they are enabled will make a system more secure.
> + * Employing run-time tracing can shed light on the dynamic system state.
> + * Workloads could change the system state by loading and unloading dynamic
> +   modules and tuning system parameters.

So what I'm missing, before this even, is a paragraph saying what this
document is actually for.  Who is the intended audience, and why might
they want to read this document?

> +System State Visualization
> +==========================
> +
> +The kernel system state can be viewed as a combination of static and
> +dynamic features and modules. Let’s first define what static and dynamic
> +system states are and then explore how we can visualize the static and
> +dynamic system parts of the kernel.
> +
> +Static System View comprises system calls, features, static and dynamic
> +modules enabled in the kernel configuration. Supported system calls

So the "static system view" includes *dynamic* modules?  Fine if that's
what you intended, but it reads a bit strangely.

> +and Kernel features are architecture dependent. System call numbering is
> +different on different architectures. We can get the supported system call
> +information using auditd utilities.
> +
> +ausyscall –dump prints out the supported system calls on a system and allows

Some clever software turned your "--" into an em-dash here.

> +mapping syscall names and numbers. You can install the auditd package on
> +Debian based systems::
> +
> +  sudo apt-get install auditd
> +
> +scripts/checksyscalls.sh can be used to check if current architecture is
> +missing any system calls compared to i386.
> +
> +scripts/get_feat.pl can be used to list the Kernel feature support matrix
> +for an architecture.
> +
> +Dynamic System View comprises system calls, ioctls invoked, and subsystems
> +used during the runtime. A workload could load and unload modules and also
> +change the dynamic system configuration to suit its needs by tuning system
> +parameters.
> +
> +What is the methodology?
> +========================
> +
> +The first step is gathering the default system state such as the dynamic
> +and static modules loaded on the system. lsmod command prints out the

*The* lsmod command

> +dynamically loaded modules on a system. Statically configured modules can
> +be found in the kernel configuration file.
> +
> +The next step is discovering system activity during run-time. You can do so
> +by enabling event tracing and then running your favorite application. After
> +a period of time, gather the event logs, and kernel messages.

Might your intended readers need a hint on enabling tracing?  A cross
reference to the appropriate docs if nothing else.

[Later I see you get to this; adding an "as described below" would help
here.]

> +Once you have the necessary information, you can extract the system call
> +numbers from the event trace log and map them to the supported system calls.
> +
> +Finding supported system calls
> +==============================
> +
> +As mentioned earlier, ausyscall prints out supported system calls
> +on a system and allows mapping syscalls names and numbers::
> +
> + ausyscall --dump
> +
> +You can look for specific system calls as shown in the below::
> +
> +  ausyscall open
> +    open               2
> +    mq_open            240
> +    openat             257
> +    perf_event_open    298
> +    open_by_handle_at  304
> +    open_tree          428
> +    fsopen             430
> +    pidfd_open         434
> +    openat2            437
> +
> +  ausyscall time
> +
> +    getitimer          36
> +    setitimer          38
> +    gettimeofday       96
> +    times              100
> +    rt_sigtimedwait    128
> +    utime              132
> +    adjtimex           159
> +    settimeofday       164
> +    time               201
> +    semtimedop         220
> +    timer_create       222
> +    timer_settime      223
> +    timer_gettime      224
> +    timer_getoverrun   225
> +    timer_delete       226
> +    clock_settime      227
> +    clock_gettime      228
> +    utimes             235
> +    mq_timedsend       242
> +    mq_timedreceive    243
> +    futimesat          261
> +    utimensat          280
> +    timerfd_create     283
> +    timerfd_settime    286
> +    timerfd_gettime    287
> +    clock_adjtime      305
> +
> +Finding unsupported system calls
> +================================
> +
> +As mentioned earlier, scripts/checksyscalls.sh checks missing system calls
> +on current architecture compared to i386. Example run::
> +
> +  checksyscalls.sh gcc
> +    warning: #warning syscall mmap2 not implemented [-Wcpp]
> +    warning: #warning syscall truncate64 not implemented [-Wcpp]
> +    warning: #warning syscall ftruncate64 not implemented [-Wcpp]
> +    warning: #warning syscall fcntl64 not implemented [-Wcpp]
> +    warning: #warning syscall sendfile64 not implemented [-Wcpp]
> +    warning: #warning syscall statfs64 not implemented [-Wcpp]
> +    warning: #warning syscall fstatfs64 not implemented [-Wcpp]
> +    warning: #warning syscall fadvise64_64 not implemented [-Wcpp]
> +
> +Let's check this against ausyscall now::
> +
> +  ausyscall map
> +    mmap               9
> +    munmap             11
> +    mremap             25
> +    remap_file_pages   216
> +
> +  ausyscall trunc
> +    truncate           76
> +    ftruncate          77
> +
> +As you can see, ausyscall shows mmap2, truncate64, and ftruncate64 aren't
> +implemented on this system. This matches what checksyscalls.sh shows.
> +
> +Finding supported features
> +==========================
> +
> +scripts/get_feat.pl can be used to list the Kernel feature support matrix
> +for an architecture::
> +
> + get_feat.pl list
> + get_feat.pl list –arch=arm64 lists

Lost the "--" again here

> +This scripts parses Documentation/features to find the support status

script (singular)

> +information. It can be used to validate the contents of the files under
> +Documentation/features or simply list them::
> +
> +  --arch Outputs features for an specific architecture, optionally filtering
> +         for a single specific feature.
> +  --feat or --feature Output features for a single specific feature.
> +
> +Here is how you can find if stackprotector and hread-info-in-task features

and *thread*-info-in-task

> +are supported::
> +
> +  scripts/get_feat.pl --arch=arm64 --feat=stackprotector list
> +    #
> +    # Kernel feature support matrix of the 'arm64' architecture:
> +    #
> +    debug/ stackprotector       :  ok  |            HAVE_STACKPROTECTOR #
> +    arch supports compiler driven stack overflow protection
> +
> +  scripts/get_feat.pl --feat=thread-info-in-task list
> +    #
> +    # Kernel feature support matrix of the 'x86' architecture:
> +    #
> +      core/ thread-info-in-task  :  ok  |           THREAD_INFO_IN_TASK #
> +      arch makes use of the core kernel facility to embed thread_info in
> +      task_struct
> +
> +Finding kernel module status
> +============================
> +
> +lsmod command shows the kernel modules that are currently loaded. This
> +program displays the contents of /proc/modules. Let's pick uvcvideo

*The* lsmod
*the* uvcvideo

> +module which is found on most laptops::
> +
> +  lsmod | grep uvc
> +  uvcvideo              126976  0
> +  videobuf2_vmalloc      20480  1 uvcvideo
> +  uvc                    16384  1 uvcvideo
> +  videobuf2_v4l2         36864  1 uvcvideo
> +  videodev              315392  2 videobuf2_v4l2,uvcvideo
> +  videobuf2_common       65536  4 videobuf2_vmalloc,videobuf2_v4l2,uvcvideo,videobuf2_memops
> +  mc                     77824  4 videodev,videobuf2_v4l2,uvcvideo,videobuf2_common
> +
> +You can see that lsmod shows uvcvideo and the modules it depends on and how
> +many modules are using them. videobuf2_common is in use by 4 other modules.
> +In other words, this is the reference count for this module and rmmod will
> +refuse to unload it as long as the reference count is > 0.
> +
> +You can get the same information from /proc.modules::
> +
> +  less /proc/modules | grep uvc

why not just "grep uvc /proc/modules" ?

> +  uvcvideo 126976 0 - Live 0x0000000000000000
> +  videobuf2_vmalloc 20480 1 uvcvideo, Live 0x0000000000000000
> +  uvc 16384 1 uvcvideo, Live 0x0000000000000000
> +  videobuf2_v4l2 36864 1 uvcvideo, Live 0x0000000000000000
> +  videodev 315392 2 uvcvideo,videobuf2_v4l2, Live 0x0000000000000000
> +  videobuf2_common 65536 4 uvcvideo,videobuf2_vmalloc,videobuf2_memops,videobuf2_v4l2, Live 0x0000000000000000
> +  mc 77824 4 uvcvideo,videobuf2_v4l2,videodev,videobuf2_common, Live 0x0000000000000000
> +
> +The information is similar with a few more extra fields. The address is the
> +base address for the module in kernel virtual memory space. When run as a
> +normal user, the address is all zeros. The same command when run as root will
> +be as follows::
> +
> +  sudo less /proc/modules | grep uvc
> +  uvcvideo 126976 0 - Live 0xffffffffc1c8b000
> +  videobuf2_vmalloc 20480 1 uvcvideo, Live 0xffffffffc167f000
> +  uvc 16384 1 uvcvideo, Live 0xffffffffc0ab0000
> +  videobuf2_v4l2 36864 1 uvcvideo, Live 0xffffffffc0a28000
> +  videodev 315392 2 uvcvideo,videobuf2_v4l2, Live 0xffffffffc16e9000
> +  videobuf2_common 65536 4 uvcvideo,videobuf2_vmalloc,videobuf2_memops,videobuf2_v4l2, Live 0xffffffffc094d000
> +  mc 77824 4 uvcvideo,videobuf2_v4l2,videodev,videobuf2_common, Live 0xffffffffc15eb000
> +
> +Let's check what modinfo shows that is important for us::
> +
> +  /sbin/modinfo uvcvideo
> +  filename:       /lib/modules/6.3.0-rc2/kernel/drivers/media/usb/uvc/uvcvideo.ko
> +  license:        GPL
> +  description:    USB Video Class driver
> +  depends:        videobuf2-v4l2,videodev,mc,uvc,videobuf2-common,videobuf2-vmalloc
> +  retpoline:      Y
> +  intree:         Y
> +  name:           uvcvideo
> +  vermagic:       6.3.0-rc2 SMP preempt mod_unload modversions
> +  sig_id:         PKCS#7
> +  signer:         Build time autogenerated kernel key
> +
> +This tells us that this module is built intree and the signed with a build
> +time autogenerated key.
> +
> +Let's do one last sanity check on the system to see if the following two
> +command outputs match::
> +
> +  ps ax | wc -l
> +  ls -d /proc/* | grep [0-9]|wc -l
> +
> +If they don't match, examine your system closely. kernel rootkits install
> +their own ps, find, etc. utilities to mask their activity. The outputs
> +match on my system. Do they on yours?

This would assume that there is no other activity on the system, of
course.  Worth saying to avoid unnecessary panic.

> +Is my system as secure as it could be?
> +======================================
> +
> +Linux kernel supports several hardening options to make system secure.

*The* Linux kernel ... to make *the* system secure

the whole document could use a pass for article use

> +kconfig-hardened-check tool sanity checks kernel configuration for
> +security. You can clone the latest kconfig-hardened-check repository::
> +
> +  git clone https://github.com/a13xp0p0v/kconfig-hardened-check.git
> +  cd kconfig-hardened-check
> +  bin/kconfig-hardened-check --config <config file> --cmdline /proc/cmdline

Should you say what <config file> is?

> +This will generate detailed report of kernel security configuration and
> +command line options that are enabled (OK) and the ones that aren't (FAIL)
> +and a summary line at the end::
> +
> +  [+] Config check is finished: 'OK' - 100 / 'FAIL' - 100
> +
> +You will have to analyze the information to determine which options make
> +sense to enable on your system.
> +
> +Understanding system run-time activity
> +======================================
> +
> +Enabling event tracing gives insight into system run-time activity. This is
> +a good way to identify which parts of the kernel are used at a higher level
> +while system is in and/or while a specific workload/process is running.
> +
> +Event tracing depends on the CONFIG_EVENT_TRACING option enabled. You can
> +enable event tracing before starting workload/process. Event tracing allows
> +you to dynamically enable and disable tracing on supported/available events.
> +You can find available events, tracers, and filter functions in the following
> +files::
> +
> +  /sys/kernel/debug/tracing/available_events
> +  /sys/kernel/debug/tracing/available_filter_functions
> +  /sys/kernel/debug/tracing/available_tracers
> +
> +Now this is how you can enable tracing::
> +
> +  sudo echo 1 > /sys/kernel/debug/tracing/events/enable
> +
> +Once the workload/process stops or when you decide you have the status you
> +need, you can disable event tracing::
> +
> +  sudo echo 0 > /sys/kernel/debug/tracing/events/enable
> +
> +You can find the tracing information in the file::
> +
> +  /sys/kernel/debug/tracing
> +
> +Here is the information shown in this file::
> +
> +  cat trace
> +  # tracer: nop
> +  #
> +  # entries-in-buffer/entries-written: 0/0   #P:16
> +  #
> +  #                                _-----=> irqs-off/BH-disabled
> +  #                               / _----=> need-resched
> +  #                              | / _---=> hardirq/softirq
> +  #                              || / _--=> preempt-depth
> +  #                              ||| / _-=> migrate-disable
> +  #                              |||| /     delay
> +  #           TASK-PID     CPU#  |||||  TIMESTAMP  FUNCTION
> +  #              | |         |   |||||     |         |
> +

That looks like the header, certainly not "the information" found in the
file.  Including some actual output would make the following discussion
more comprehensible.

> +Analyzing traces
> +================
> +
> +You will be able map the functions to system calls and other kernel features
> +to get insight into the overall system activity while a workload/process is
> +running.
> +
> +Map the NR (syscal) numbers from the trace to syscalls from the syscalls dump.

(syscall)

> +Categorize system calls and map them to Linux subsystems.

Not sure what that sentence is trying to tell readers.  Again, who is
the audience; will a readership that needs to be told how to install
auditd be able to make sense of this and act on it?

> +Conclusion
> +==========
> +
> +This document is intended to be used as a guide on how to gather higher level
> +information about a system and its run-time activity. The approach described
> +in this document helps us get insight into supported system calls, features,
> +assess how secure a system is, and its run-time activity.
> +
> +References
> +==========
> +
> + * https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/scripts/checksyscalls.sh
> + * https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/scripts/get_feat.pl
> + * https://github.com/a13xp0p0v/kconfig-hardened-check
> + * https://docs.kernel.org/trace/index.html

Thanks,

jon




[Index of Archives]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]     [Linux Resources]

  Powered by Linux