Introduce a new kselftest to identify slowdowns in key boot events. This test uses ftrace to monitor the start and end times, as well as the durations of all initcalls, and compares these timings to reference values to identify significant slowdowns. The script functions in two modes: the 'generate' mode allows to create a JSON file containing initial reference timings for all initcalls from a known stable kernel. The 'test' mode can be used during subsequent boots to assess current timings against the reference values and determine if there are any significant differences. The test ships with a bootconfig file for setting up ftrace and a configuration fragment for the necessary kernel configs. Signed-off-by: Laura Nao <laura.nao@xxxxxxxxxxxxx> --- Hello, This v2 is a follow-up to RFCv1[1] and includes changes based on feedback from the LPC 2024 session [2], along with some other fixes. [1] https://lore.kernel.org/all/20240725110622.96301-1-laura.nao@xxxxxxxxxxxxx/ [2] https://www.youtube.com/watch?v=rWhW2-Vzi40 After reviewing other available tests and considering the feedback from discussions at Plumbers, I decided to stick with the bootconfig file approach but extend it to track all initcalls instead of a fixed set of functions or events. The bootconfig file can be expanded and adapted to track additional functions if needed for specific use cases. I also defined a synthetic event to calculate initcall durations, while still tracking their start and end times. Users are then allowed to choose whether to compare start times, end times, or durations. Support for specifying different rules for different initcalls has also been added. In RFCv1, there was some discussion about using existing tools like bootgraph.py. However, the output from these tools is mainly for manual inspection (e.g., HTML visual output), whereas this test is designed to run in automated CI environments too. The kselftest proposed here combines the process of generating reference data and running tests into a single script with two modes, making it easy to integrate into automated workflows. Many of the features in this v2 (e.g., generating a JSON reference file, comparing timings, and reporting results in KTAP format) could potentially be integrated into bootgraph.py with some effort. However, since this test is intended for automated execution rather than manual use, I've decided to keep it separate for now and explore the options suggested at LPC, such as using ftrace histograms for initcall latencies. I'm open to revisiting this decision and working toward integrating the changes into bootgraph.py if there's a strong preference for unifying the tools. Let me know your thoughts. A comprehensive changelog is reported below. Thanks, Laura --- Changes in v2: - Updated ftrace configuration to track all initcall start times, end times, and durations, and generate a histogram. - Modified test logic to compare initcall durations by default, with the option to compare start or end times if needed. - Added warnings if the initcalls in the reference file differ from those detected in the running system. - Combined the scripts into a single script with two modes: one for generating the reference file and one for running the test. - Added support for specifying different rules for individual initcalls. - Switched the reference format from YAML to JSON. - Added metadata to the reference file, including kernel version, kernel configuration, and cmdline. - Link to v1: https://lore.kernel.org/all/20240725110622.96301-1-laura.nao@xxxxxxxxxxxxx/ --- tools/testing/selftests/Makefile | 1 + tools/testing/selftests/boot-time/Makefile | 16 ++ tools/testing/selftests/boot-time/bootconfig | 15 + tools/testing/selftests/boot-time/config | 6 + .../selftests/boot-time/test_boot_time.py | 265 ++++++++++++++++++ 5 files changed, 303 insertions(+) create mode 100644 tools/testing/selftests/boot-time/Makefile create mode 100644 tools/testing/selftests/boot-time/bootconfig create mode 100644 tools/testing/selftests/boot-time/config create mode 100755 tools/testing/selftests/boot-time/test_boot_time.py diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile index b38199965f99..1bb20d1e3854 100644 --- a/tools/testing/selftests/Makefile +++ b/tools/testing/selftests/Makefile @@ -3,6 +3,7 @@ TARGETS += acct TARGETS += alsa TARGETS += amd-pstate TARGETS += arm64 +TARGETS += boot-time TARGETS += bpf TARGETS += breakpoints TARGETS += cachestat diff --git a/tools/testing/selftests/boot-time/Makefile b/tools/testing/selftests/boot-time/Makefile new file mode 100644 index 000000000000..cdcdc1bbe779 --- /dev/null +++ b/tools/testing/selftests/boot-time/Makefile @@ -0,0 +1,16 @@ +PY3 = $(shell which python3 2>/dev/null) + +ifneq ($(PY3),) + +TEST_PROGS := test_boot_time.py + +include ../lib.mk + +else + +all: no_py3_warning + +no_py3_warning: + @echo "Missing python3. This test will be skipped." + +endif \ No newline at end of file diff --git a/tools/testing/selftests/boot-time/bootconfig b/tools/testing/selftests/boot-time/bootconfig new file mode 100644 index 000000000000..e4b89a33b7a3 --- /dev/null +++ b/tools/testing/selftests/boot-time/bootconfig @@ -0,0 +1,15 @@ +ftrace.event { + synthetic.initcall_latency { + # Synthetic event to record initcall latency, start, and end times + fields = "unsigned long func", "u64 lat", "u64 start", "u64 end" + actions = "hist:keys=func.sym,start,end:vals=lat:sort=lat" + } + initcall.initcall_start { + # Capture the start time (ts0) when initcall starts + actions = "hist:keys=func:ts0=common_timestamp.usecs" + } + initcall.initcall_finish { + # Capture the end time, calculate latency, and trigger synthetic event + actions = "hist:keys=func:lat=common_timestamp.usecs-$ts0:start=$ts0:end=common_timestamp.usecs:onmatch(initcall.initcall_start).initcall_latency(func,$lat,$start,$end)" + } +} \ No newline at end of file diff --git a/tools/testing/selftests/boot-time/config b/tools/testing/selftests/boot-time/config new file mode 100644 index 000000000000..bcb646ec3cd8 --- /dev/null +++ b/tools/testing/selftests/boot-time/config @@ -0,0 +1,6 @@ +CONFIG_TRACING=y +CONFIG_BOOTTIME_TRACING=y +CONFIG_BOOT_CONFIG_EMBED=y +CONFIG_BOOT_CONFIG_EMBED_FILE="tools/testing/selftests/boot-time/bootconfig" +CONFIG_SYNTH_EVENTS=y +CONFIG_HIST_TRIGGERS=y \ No newline at end of file diff --git a/tools/testing/selftests/boot-time/test_boot_time.py b/tools/testing/selftests/boot-time/test_boot_time.py new file mode 100755 index 000000000000..556dacf04b6d --- /dev/null +++ b/tools/testing/selftests/boot-time/test_boot_time.py @@ -0,0 +1,265 @@ +#!/usr/bin/env python3 +# SPDX-License-Identifier: GPL-2.0 +# +# Copyright (c) 2024 Collabora Ltd +# +# This script reads the +# /sys/kernel/debug/tracing/events/synthetic/initcall_latency/hist file, +# extracts function names and timings, and compares them against reference +# timings provided in an input JSON file to identify significant boot +# slowdowns. +# The script operates in two modes: +# - Generate Mode: parses initcall timings from the current kernel's ftrace +# event histogram and generates a JSON reference file with function +# names, start times, end times, and latencies. +# - Test Mode: compares current initcall timings against the reference +# file, allowing users to define a maximum allowed difference between the +# values (delta). Users can also apply custom delta thresholds for +# specific initcalls using regex-based overrides. The comparison can be +# done on latency, start, or end times. +# + +import os +import sys +import argparse +import gzip +import json +import re +import subprocess + +this_dir = os.path.dirname(os.path.realpath(__file__)) +sys.path.append(os.path.join(this_dir, "../kselftest/")) + +import ksft + +def load_reference_from_json(file_path): + """ + Load reference data from a JSON file and returns the parsed data. + @file_path: path to the JSON file. + """ + + try: + with open(file_path, 'r', encoding="utf-8") as file: + return json.load(file) + except FileNotFoundError: + ksft.print_msg(f"Error: File {file_path} not found.") + ksft.exit_fail() + except json.JSONDecodeError: + ksft.print_msg(f"Error: Failed to decode JSON from {file_path}.") + ksft.exit_fail() + + +def mount_debugfs(path): + """ + Mount debugfs at the specified path if it is not already mounted. + @path: path where debugfs should be mounted + """ + # Check if debugfs is already mounted + with open('/proc/mounts', 'r', encoding="utf-8") as mounts: + for line in mounts: + if 'debugfs' in line and path in line: + print(f"debugfs is already mounted at {path}") + return True + + # Mount debugfs + try: + subprocess.run(['mount', '-t', 'debugfs', 'none', path], check=True) + return True + except subprocess.CalledProcessError as e: + print(f"Failed to mount debugfs: {e.stderr}") + return False + + +def ensure_unique_function_name(func, initcall_entries): + """ + Ensure the function name is unique by appending a suffix if necessary. + @func: the original function name. + @initcall_entries: a dictionary containing parsed initcall entries. + """ + i = 2 + base_func = func + while func in initcall_entries: + func = f'{base_func}[{i}]' + i += 1 + return func + + +def parse_initcall_latency_hist(): + """ + Parse the ftrace histogram for the initcall_latency event, extracting + function names, start times, end times, and latencies. Return a + dictionary where each entry is structured as follows: + { + <function symbolic name>: { + "start": <start time>, + "end": <end time>, + "latency": <latency> + } + } + """ + + pattern = re.compile(r'\{ func: \[\w+\] ([\w_]+)\s*, start: *(\d+), end: *(\d+) \} hitcount: *\d+ lat: *(\d+)') + initcall_entries = {} + + try: + with open('/sys/kernel/debug/tracing/events/synthetic/initcall_latency/hist', 'r', encoding="utf-8") as hist_file: + for line in hist_file: + match = pattern.search(line) + if match: + func = match.group(1).strip() + start = int(match.group(2)) + end = int(match.group(3)) + latency = int(match.group(4)) + + # filter out unresolved names + if not func.startswith("0x"): + func = ensure_unique_function_name(func, initcall_entries) + + initcall_entries[func] = { + "start": start, + "end": end, + "latency": latency + } + except FileNotFoundError: + print("Error: Histogram file not found.") + + return initcall_entries + + +def compare_initcall_list(ref_initcall_entries, cur_initcall_entries): + """ + Compare the current list of initcall functions against the reference + file. Print warnings if there are unique entries in either. + @ref_initcall_entries: reference initcall entries. + @cur_initcall_entries: current initcall entries. + """ + ref_entries = set(ref_initcall_entries.keys()) + cur_entries = set(cur_initcall_entries.keys()) + + unique_to_ref = ref_entries - cur_entries + unique_to_cur = cur_entries - ref_entries + + if (unique_to_ref): + ksft.print_msg( + f"Warning: {list(unique_to_ref)} not found in current data. Consider updating reference file.") + if unique_to_cur: + ksft.print_msg( + f"Warning: {list(unique_to_cur)} not found in reference data. Consider updating reference file.") + + +def run_test(ref_file_path, delta, overrides, mode): + """ + Run the test comparing the current timings with the reference values. + @ref_file_path: path to the JSON file containing reference values. + @delta: default allowed difference between reference and current + values. + @overrides: override rules in the form of regex:threshold. + @mode: the comparison mode (either 'start', 'end', or 'latency'). + """ + + ref_data = load_reference_from_json(ref_file_path) + + ref_initcall_entries = ref_data['data'] + cur_initcall_entries = parse_initcall_latency_hist() + + compare_initcall_list(ref_initcall_entries, cur_initcall_entries) + + ksft.set_plan(len(ref_initcall_entries)) + + for func_name in ref_initcall_entries: + effective_delta = delta + for regex, override_delta in overrides.items(): + if re.match(regex, func_name): + effective_delta = override_delta + break + if (func_name in cur_initcall_entries): + ref_metric = ref_initcall_entries[func_name].get(mode) + cur_metric = cur_initcall_entries[func_name].get(mode) + if (cur_metric > ref_metric and (cur_metric - ref_metric) >= effective_delta): + ksft.test_result_fail(func_name) + ksft.print_msg(f"'{func_name}' {mode} differs by " + f"{(cur_metric - ref_metric)} usecs.") + else: + ksft.test_result_pass(func_name) + else: + ksft.test_result_skip(func_name) + + +def generate_reference_file(file_path): + """ + Generate a reference file in JSON format, containing kernel metadata + and initcall timing data. + @file_path: output file path. + """ + metadata = {} + + config_file = "/proc/config.gz" + if os.path.isfile(config_file): + with gzip.open(config_file, "rt", encoding="utf-8") as f: + config = f.read() + metadata["config"] = config + + metadata["version"] = os.uname().release + + cmdline_file = "/proc/cmdline" + if os.path.isfile(cmdline_file): + with open(cmdline_file, "r", encoding="utf-8") as f: + cmdline = f.read().strip() + metadata["cmdline"] = cmdline + + ref_data = { + "metadata": metadata, + "data": parse_initcall_latency_hist(), + } + + with open(file_path, "w", encoding='utf-8') as f: + json.dump(ref_data, f, indent=4) + print(f"Generated {file_path}") + + +if __name__ == "__main__": + parser = argparse.ArgumentParser( + description="") + + subparsers = parser.add_subparsers(dest='mode', required=True, help='Choose between generate or test modes') + + generate_parser = subparsers.add_parser('generate', help="Generate a reference file") + generate_parser.add_argument('out_ref_file', nargs='?', default='reference_initcall_timings.json', + help='Path to output JSON reference file (default: reference_initcall_timings.json)') + + compare_parser = subparsers.add_parser('test', help='Test against a reference file') + compare_parser.add_argument('in_ref_file', help='Path to JSON reference file') + compare_parser.add_argument( + 'delta', type=int, help='Maximum allowed delta between the current and the reference timings (usecs)') + compare_parser.add_argument('--override', '-o', action='append', type=str, + help="Specify regex-based rules as regex:delta (e.g., '^acpi_.*:50')") + compare_parser.add_argument('--mode', '-m', default='latency', choices=[ + 'start', 'end', 'latency'], + help="Comparison mode: 'latency' (default) for latency, 'start' for start times, or 'end' for end times.") + + args = parser.parse_args() + + if args.mode == 'generate': + generate_reference_file(args.out_ref_file) + sys.exit(0) + + # Process overrides + overrides = {} + if args.override: + for override in args.override: + try: + pattern, delta = override.split(":") + overrides[pattern] = int(delta) + except ValueError: + print(f"Invalid override format: {override}. Expected format is 'regex:delta'.") + sys.exit(1) + + # Ensure debugfs is mounted + if not mount_debugfs("/sys/kernel/debug"): + ksft.exit_fail() + + ksft.print_header() + + run_test(args.in_ref_file, args.delta, overrides, args.mode) + + ksft.finished() -- 2.30.2