Why? This is a preparation patch set that will make it easier for us to apply subsequent patches that will remove the sentinel element (last empty element) in the ctl_table arrays. In itself, it does not remove any sentinels but it is needed to bring all the advantages of the removal to fruition which is to help reduce the overall build time size of the kernel and run time memory bloat by about ~64 bytes per sentinel. Without this patch set we would have to put everything into one big commit making the review process that much longer and harder for everyone. Since it is so related to the removal of the sentinel element, its worth while to give a bit of context on this: * Good summary from Luis about why we want to remove the sentinels. https://lore.kernel.org/all/ZMFizKFkVxUFtSqa@xxxxxxxxxxxxxxxxxxxxxx/ * This is a patch set that replaces register_sysctl_table with register_sysctl https://lore.kernel.org/all/20230302204612.782387-1-mcgrof@xxxxxxxxxx/ * Patch set to deprecate register_sysctl_paths() https://lore.kernel.org/all/20230302202826.776286-1-mcgrof@xxxxxxxxxx/ * Here there is an explicit expectation for the removal of the sentinel element. https://lore.kernel.org/all/20230321130908.6972-1-frank.li@xxxxxxxx * The "ARRAY_SIZE" approach was mentioned (proposed?) in this thread https://lore.kernel.org/all/20220220060626.15885-1-tangmeng@xxxxxxxxxxxxx What? These commits set things up so we can start removing the sentinel elements. They modify sysctl and net_sysctl internals so that registering a ctl_table that contains a sentinel gives the same result as passing a table_size calculated from the ctl_table array without a sentinel. We accomplish this by introducing a table_size argument in the same place where procname is checked for NULL. The idea is for it to keep stopping when it hits ->procname == NULL, while the sentinel is still present. And when the sentinel is removed, it will stop on the table_size (thx to jani.nikula@xxxxxxxxxxxxxxx for the discussion that led to this). This allows us to remove sentinels from one (or several) files at a time. These commits are part of a bigger set containing the removal of ctl_table sentinel (https://github.com/Joelgranados/linux/tree/tag/sysctl_remove_empty_elem_V2). The idea is to make the review process easier by chunking the 65+ commits into manageable pieces. My idea is to send out one chunk at a time so it can be reviewed separately from the others without the noise from parallel related sets. After this first chunk will come 6 that remove the sentinel element from "arch/*, drivers/*, fs/*, kernel/*, net/* and miscellaneous. And then a final one that removes the ->procname == NULL check. You can see all commits here (https://github.com/Joelgranados/linux/tree/tag/sysctl_remove_empty_elem_V2). Commits in this chunk: * Preparation commits: start : sysctl: Prefer ctl_table_header in proc_sysct end : sysctl: Add size argument to init_header These are preparation commits that make sure that we have the ctl_table_header where we need the array size. * Add size to __register_sysctl_table, __register_sysctl_init and register_sysctl start : sysctl: Add a size arg to __register_sysctl_table end : sysctl: Add size arg to __register_sysctl_init Here we replace the existing register functions with macros that add the ARRAY_SIZE automatically. Unfortunately these macros cannot be used for the register calls that pass a pointer; in this situation we add register functions with an table_size argument (thx to greg@xxxxxxxxx for bringing this to my attention) * Add size to register_net_sysctl start : sysctl: Add size to register_net_sysctl function end : sysctl: SIZE_MAX->ARRAY_SIZE in register_net_sysctl register_net_sysctl is an indirection function to the sysctl registrations and needed a several commits to add table_size to all its callers. We temporarily use SIZE_MAX to avoid compiler warnings while we change to register_net_sysctl to register_net_sysctl_sz; we remove it with the penultimate patch of this set. Finally, we make sure to adjust the calculated size every time there is a check for unprivileged users. * Add size as additional stopping criteria commit : sysctl: Use ctl_table_size as stopping criteria for list macro We add table_size check in the main macro within proc_sysctl.c. This commit allows the removal of the sentinel element by chunks. Testing: * Ran sysctl selftests (./tools/testing/selftests/sysctl/sysctl.sh) * Successfully ran this through 0-day Size saving estimates: A consequence of eventually removing all the sentinels (64 bytes per sentinel) is the bytes we save. These are *not* numbers that we will get after this patch set; these are the numbers that we will get after removing all the sentinels. I included them here because they are relevant and to get an idea of just how much memory we are talking about. * bloat-o-meter: The "yesall" configuration results save 9158 bytes (you can see the output here https://lore.kernel.org/all/20230621091000.424843-1-j.granados@xxxxxxxxxxx/. The "tiny" configuration + CONFIG_SYSCTL save 1215 bytes (you can see the output here [2]) * memory usage: As we no longer need the sentinel element within proc_sysctl.c, we save some bytes in main memory as well. In my testing kernel I measured a difference of 6720 bytes. I include the way to measure this in [1] Comments/feedback greatly appreciated V2: * Dropped moving mpls_table up the af_mpls.c file. We don't need it any longer as it is not really used before its current location. * Added/Clarified the why in several commit messages that were missing it. * Clarified the why in the cover letter to be "to make it easier to apply subsequent patches that will remove the sentinels" * Added documentation for table_size * Added suggested by tags (Greg and Jani) to relevant commits Best Joel [1] To measure the in memory savings apply this patch on top of https://github.com/Joelgranados/linux/tree/tag/sysctl_remove_empty_elem_V1 " diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c index 5f413bfd6271..9aa8374c0ef1 100644 --- a/fs/proc/proc_sysctl.c +++ b/fs/proc/proc_sysctl.c @@ -975,6 +975,7 @@ static struct ctl_dir *new_dir(struct ctl_table_set *set, table[0].procname = new_name; table[0].mode = S_IFDIR|S_IRUGO|S_IXUGO; init_header(&new->header, set->dir.header.root, set, node, table, 1); + printk("%ld sysctl saved mem kzalloc \n", sizeof(struct ctl_table)); return new; } @@ -1202,6 +1203,7 @@ static struct ctl_table_header *new_links(struct ctl_dir *dir, struct ctl_table_ head->ctl_table_size); links->nreg = head->ctl_table_size; + printk("%ld sysctl saved mem kzalloc \n", sizeof(struct ctl_table)); return links; } " and then run the following bash script in the kernel: accum=0 for n in $(dmesg | grep kzalloc | awk '{print $3}') ; do echo $n accum=$(calc "$accum + $n") done echo $accum [2] bloat-o-meter with "tiny" config: add/remove: 0/2 grow/shrink: 33/24 up/down: 470/-1685 (-1215) Function old new delta insert_header 831 966 +135 __register_sysctl_table 971 1092 +121 get_links 177 226 +49 put_links 167 186 +19 erase_header 55 66 +11 sysctl_init_bases 59 69 +10 setup_sysctl_set 65 73 +8 utsname_sysctl_init 26 31 +5 sld_mitigate_sysctl_init 33 38 +5 setup_userns_sysctls 158 163 +5 sched_rt_sysctl_init 33 38 +5 sched_fair_sysctl_init 33 38 +5 sched_dl_sysctl_init 33 38 +5 random_sysctls_init 33 38 +5 page_writeback_init 122 127 +5 oom_init 73 78 +5 kernel_panic_sysctls_init 33 38 +5 kernel_exit_sysctls_init 33 38 +5 init_umh_sysctls 33 38 +5 init_signal_sysctls 33 38 +5 init_pipe_fs 94 99 +5 init_fs_sysctls 33 38 +5 init_fs_stat_sysctls 33 38 +5 init_fs_namespace_sysctls 33 38 +5 init_fs_namei_sysctls 33 38 +5 init_fs_inode_sysctls 33 38 +5 init_fs_exec_sysctls 33 38 +5 init_fs_dcache_sysctls 33 38 +5 register_sysctl 22 25 +3 __register_sysctl_init 9 12 +3 user_namespace_sysctl_init 149 151 +2 sched_core_sysctl_init 38 40 +2 register_sysctl_mount_point 13 15 +2 vm_table 1344 1280 -64 vm_page_writeback_sysctls 512 448 -64 vm_oom_kill_table 256 192 -64 uts_kern_table 448 384 -64 usermodehelper_table 192 128 -64 user_table 576 512 -64 sld_sysctls 128 64 -64 signal_debug_table 128 64 -64 sched_rt_sysctls 256 192 -64 sched_fair_sysctls 128 64 -64 sched_dl_sysctls 192 128 -64 sched_core_sysctls 64 - -64 root_table 128 64 -64 random_table 448 384 -64 namei_sysctls 320 256 -64 kern_table 1792 1728 -64 kern_panic_table 128 64 -64 kern_exit_table 128 64 -64 inodes_sysctls 192 128 -64 fs_stat_sysctls 256 192 -64 fs_shared_sysctls 192 128 -64 fs_pipe_sysctls 256 192 -64 fs_namespace_sysctls 128 64 -64 fs_exec_sysctls 128 64 -64 fs_dcache_sysctls 128 64 -64 init_header 85 - -85 Total: Before=1877669, After=1876454, chg -0.06% base: fdf0eaf11452 Joel Granados (14): sysctl: Prefer ctl_table_header in proc_sysctl sysctl: Use ctl_table_header in list_for_each_table_entry sysctl: Add ctl_table_size to ctl_table_header sysctl: Add size argument to init_header sysctl: Add a size arg to __register_sysctl_table sysctl: Add size to register_sysctl sysctl: Add size arg to __register_sysctl_init sysctl: Add size to register_net_sysctl function ax.25: Update to register_net_sysctl_sz netfilter: Update to register_net_sysctl_sz networking: Update to register_net_sysctl_sz vrf: Update to register_net_sysctl_sz sysctl: SIZE_MAX->ARRAY_SIZE in register_net_sysctl sysctl: Use ctl_table_size as stopping criteria for list macro arch/arm64/kernel/armv8_deprecated.c | 2 +- arch/s390/appldata/appldata_base.c | 2 +- drivers/net/vrf.c | 3 +- fs/proc/proc_sysctl.c | 90 +++++++++++++------------ include/linux/sysctl.h | 31 +++++++-- include/net/ipv6.h | 2 + include/net/net_namespace.h | 10 +-- ipc/ipc_sysctl.c | 4 +- ipc/mq_sysctl.c | 4 +- kernel/ucount.c | 5 +- net/ax25/sysctl_net_ax25.c | 3 +- net/bridge/br_netfilter_hooks.c | 3 +- net/core/neighbour.c | 8 ++- net/core/sysctl_net_core.c | 3 +- net/ieee802154/6lowpan/reassembly.c | 8 ++- net/ipv4/devinet.c | 3 +- net/ipv4/ip_fragment.c | 3 +- net/ipv4/route.c | 8 ++- net/ipv4/sysctl_net_ipv4.c | 3 +- net/ipv4/xfrm4_policy.c | 3 +- net/ipv6/addrconf.c | 3 +- net/ipv6/icmp.c | 5 ++ net/ipv6/netfilter/nf_conntrack_reasm.c | 3 +- net/ipv6/reassembly.c | 3 +- net/ipv6/route.c | 13 ++-- net/ipv6/sysctl_net_ipv6.c | 16 +++-- net/ipv6/xfrm6_policy.c | 3 +- net/mpls/af_mpls.c | 6 +- net/mptcp/ctrl.c | 3 +- net/netfilter/ipvs/ip_vs_ctl.c | 8 ++- net/netfilter/ipvs/ip_vs_lblc.c | 10 ++- net/netfilter/ipvs/ip_vs_lblcr.c | 10 ++- net/netfilter/nf_conntrack_standalone.c | 4 +- net/netfilter/nf_log.c | 7 +- net/rds/tcp.c | 3 +- net/sctp/sysctl.c | 4 +- net/smc/smc_sysctl.c | 3 +- net/sysctl_net.c | 26 ++++--- net/unix/sysctl_net_unix.c | 3 +- net/xfrm/xfrm_sysctl.c | 8 ++- 40 files changed, 222 insertions(+), 117 deletions(-) -- 2.30.2