From: Joel Granados <j.granados@xxxxxxxxxxx> What? These commits remove the sentinel element (last empty element) from the sysctl arrays of all the files under the "kernel/" directory that use a sysctl array for registration. The merging of the preparation patches [1] to mainline allows us to remove sentinel elements without changing behavior. This is safe because the sysctl registration code (register_sysctl() and friends) use the array size in addition to checking for a sentinel [2]. Why? By removing the sysctl sentinel elements we avoid kernel bloat as ctl_table arrays get moved out of kernel/sysctl.c into their own respective subsystems. This move was started long ago to avoid merge conflicts; the sentinel removal bit came after Mathew Wilcox suggested it to avoid bloating the kernel by one element as arrays moved out. This patchset will reduce the overall build time size of the kernel and run time memory bloat by about ~64 bytes per declared ctl_table array (more info here [5]). When are we done? There are 4 patchests (25 commits [3]) that are still outstanding to completely remove the sentinels: files under "net/", files under "kernel/" (this patchset) dir, misc dirs (files under mm/ security/ and others) and the final set that removes the unneeded check for ->procname == NULL. Testing: * Ran sysctl selftests (./tools/testing/selftests/sysctl/sysctl.sh) * Ran this through 0-day with no errors or warnings Savings in vmlinux: A total of 64 bytes per sentinel is saved after removal; I measured in x86_64 to give an idea of the aggregated savings. The actual savings will depend on individual kernel configuration. * bloat-o-meter - The "yesall" config saves 1984 bytes [6] - A reduced config [4] saves 1027 bytes [7] Savings in allocated memory: None in this set but will occur when the superfluous allocations are removed from proc_sysctl.c. I include it here for context. The estimated savings during boot for config [3] are 6272 bytes. See [8] for how to measure it. Comments/feedback greatly appreciated Changes in v3: - Rebased to v6.9-rc1 - wrote a shorter cover letter - Removed willy@xxxxxxxxxxxxx from cc - Link to v2: https://lore.kernel.org/r/20240104-jag-sysctl_remove_empty_elem_kernel-v2-0-836cc04e00ec@xxxxxxxxxxx Changes in v2: - No functional changes; I resent it as I did not see it in the latest sysctl-next. It might be a bit too late to include it in 6.7 version, but this v2 can be used for 6.8 when it comes out. - Rebased on top of v6.7-rc6 - Added trailers to the relevant commits. - Link to v1: https://lore.kernel.org/r/20231107-jag-sysctl_remove_empty_elem_kernel-v1-0-e4ce1388dfa0@xxxxxxxxxxx Best Joel [1] https://lore.kernel.org/all/ZO5Yx5JFogGi%2FcBo@xxxxxxxxxxxxxxxxxxxxxx/ [2] https://lore.kernel.org/all/ZO5Yx5JFogGi%2FcBo@xxxxxxxxxxxxxxxxxxxxxx/ [3] https://git.kernel.org/pub/scm/linux/kernel/git/joel.granados/linux.git/tag/?h=sysctl_remove_empty_elem_v5 [4] https://gist.github.com/Joelgranados/feaca7af5537156ca9b73aeaec093171 [5] Links Related to the ctl_table sentinel removal: * Good summaries from Luis: https://lore.kernel.org/all/ZO5Yx5JFogGi%2FcBo@xxxxxxxxxxxxxxxxxxxxxx/ https://lore.kernel.org/all/ZMFizKFkVxUFtSqa@xxxxxxxxxxxxxxxxxxxxxx/ * Patches adjusting sysctl register calls: https://lore.kernel.org/all/20230302204612.782387-1-mcgrof@xxxxxxxxxx/ https://lore.kernel.org/all/20230302202826.776286-1-mcgrof@xxxxxxxxxx/ * Discussions about expectations and approach https://lore.kernel.org/all/20230321130908.6972-1-frank.li@xxxxxxxx https://lore.kernel.org/all/20220220060626.15885-1-tangmeng@xxxxxxxxxxxxx [6] add/remove: 0/0 grow/shrink: 0/31 up/down: 0/-1984 (-1984) Function old new delta watchdog_sysctls 576 512 -64 watchdog_hardlockup_sysctl 128 64 -64 vm_table 1344 1280 -64 uts_kern_table 448 384 -64 usermodehelper_table 192 128 -64 user_table 832 768 -64 user_event_sysctls 128 64 -64 timer_sysctl 128 64 -64 signal_debug_table 128 64 -64 seccomp_sysctl_table 192 128 -64 sched_rt_sysctls 256 192 -64 sched_fair_sysctls 256 192 -64 sched_energy_aware_sysctls 128 64 -64 sched_dl_sysctls 192 128 -64 sched_core_sysctls 384 320 -64 sched_autogroup_sysctls 128 64 -64 printk_sysctls 512 448 -64 pid_ns_ctl_table_vm 128 64 -64 pid_ns_ctl_table 128 64 -64 latencytop_sysctl 128 64 -64 kprobe_sysctls 128 64 -64 kexec_core_sysctls 256 192 -64 kern_table 2560 2496 -64 kern_reboot_table 192 128 -64 kern_panic_table 192 128 -64 kern_exit_table 128 64 -64 kern_delayacct_table 128 64 -64 kern_acct_table 128 64 -64 hung_task_sysctls 448 384 -64 ftrace_sysctls 128 64 -64 bpf_syscall_table 192 128 -64 Total: Before=429912331, After=429910347, chg -0.00% [7] add/remove: 0/1 grow/shrink: 0/16 up/down: 0/-1027 (-1027) Function old new delta sched_core_sysctl_init 39 36 -3 vm_table 1024 960 -64 uts_kern_table 448 384 -64 usermodehelper_table 192 128 -64 user_table 704 640 -64 signal_debug_table 128 64 -64 seccomp_sysctl_table 192 128 -64 sched_rt_sysctls 256 192 -64 sched_fair_sysctls 128 64 -64 sched_dl_sysctls 192 128 -64 sched_core_sysctls 64 - -64 printk_sysctls 512 448 -64 pid_ns_ctl_table_vm 128 64 -64 kern_table 1920 1856 -64 kern_reboot_table 192 128 -64 kern_panic_table 128 64 -64 kern_exit_table 128 64 -64 Total: Before=8522228, After=8521201, chg -0.01% [8] To measure the in memory savings apply this on top of this patchset. " " diff --git i/fs/proc/proc_sysctl.c w/fs/proc/proc_sysctl.c index 37cde0efee57..896c498600e8 100644 --- i/fs/proc/proc_sysctl.c +++ w/fs/proc/proc_sysctl.c @@ -966,6 +966,7 @@ static struct ctl_dir *new_dir(struct ctl_table_set *set, table[0].procname = new_name; table[0].mode = S_IFDIR|S_IRUGO|S_IXUGO; init_header(&new->header, set->dir.header.root, set, node, table, 1); + printk("%ld sysctl saved mem kzalloc\n", sizeof(struct ctl_table)); return new; } @@ -1189,6 +1190,7 @@ static struct ctl_table_header *new_links(struct ctl_dir *dir, s> link_name += len; link++; } + printk("%ld sysctl saved mem kzalloc\n", sizeof(struct ctl_table)); init_header(links, dir->header.root, dir->header.set, node, link_table, head->ctl_table_size); links->nreg = nr_entries; " and then run the following bash script in the kernel: accum=0 for n in $(dmesg | grep kzalloc | awk '{print $3}') ; do accum=$(calc "$accum + $n") done echo $accum --- Signed-off-by: Joel Granados <j.granados@xxxxxxxxxxx> --- Joel Granados (10): kernel misc: Remove the now superfluous sentinel elements from ctl_table array umh: Remove the now superfluous sentinel elements from ctl_table array ftrace: Remove the now superfluous sentinel elements from ctl_table array timekeeping: Remove the now superfluous sentinel elements from ctl_table array seccomp: Remove the now superfluous sentinel elements from ctl_table array scheduler: Remove the now superfluous sentinel elements from ctl_table array printk: Remove the now superfluous sentinel elements from ctl_table array kprobes: Remove the now superfluous sentinel elements from ctl_table array delayacct: Remove the now superfluous sentinel elements from ctl_table array bpf: Remove the now superfluous sentinel elements from ctl_table array kernel/acct.c | 1 - kernel/bpf/syscall.c | 1 - kernel/delayacct.c | 1 - kernel/exit.c | 1 - kernel/hung_task.c | 1 - kernel/kexec_core.c | 1 - kernel/kprobes.c | 1 - kernel/latencytop.c | 1 - kernel/panic.c | 1 - kernel/pid_namespace.c | 1 - kernel/pid_sysctl.h | 1 - kernel/printk/sysctl.c | 1 - kernel/reboot.c | 1 - kernel/sched/autogroup.c | 1 - kernel/sched/core.c | 1 - kernel/sched/deadline.c | 1 - kernel/sched/fair.c | 1 - kernel/sched/rt.c | 1 - kernel/sched/topology.c | 1 - kernel/seccomp.c | 1 - kernel/signal.c | 1 - kernel/stackleak.c | 1 - kernel/sysctl.c | 2 -- kernel/time/timer.c | 1 - kernel/trace/ftrace.c | 1 - kernel/trace/trace_events_user.c | 1 - kernel/ucount.c | 3 +-- kernel/umh.c | 1 - kernel/utsname_sysctl.c | 1 - kernel/watchdog.c | 2 -- 30 files changed, 1 insertion(+), 33 deletions(-) --- base-commit: 4cece764965020c22cff7665b18a012006359095 change-id: 20231107-jag-sysctl_remove_empty_elem_kernel-7de90cfd0c0a Best regards, -- Joel Granados <j.granados@xxxxxxxxxxx>