On Fri, Jul 22, 2022 at 11:24 AM Mauricio Faria de Oliveira <mfo@xxxxxxxxxxxxx> wrote: > > This series allows modules to have "sysctl:<procname>" module aliases > for sysctl entries, registering sysctl tables with modpost/file2alias. > (Similarly to "pci:<IDs>" aliases for PCI ID tables in device drivers.) > > The issue behind it: if a sysctl value is in /etc/sysctl.{conf,d/*.conf} > but does not exist in /proc/sys/ when the userspace tool that applies it > runs, it does not get set. > > It would be nice if the tool could run 'modprobe sysctl:<something>' and > get that '/proc/sys/.../something' up (as an administrator configured it) > and then set it, as intended. (A bit like PCI ID-based module loading.) > > ... > > The series is relatively simple, except for patch 4 (IMHO) due to ELF. > > - Patches 1-2 simplify ELF code in modpost.c (code moves, not in-depth). > - Patches 3-4 implement the feature (patch 4 is more in-depth). > - Patches 5-6 consume and expose it. > > I have tested it on x86_64 with next-20220721, and it looks correct > ('modprobe sysctl:nf_conntrack_max' works; other aliases there; see below). I did not test this patch set at all, but I am afraid you took a good case as an example. I see two locations for the "fib_multipath_hash_fields" parameter for example. # find /proc/sys/ -name fib_multipath_hash_fields /proc/sys/net/ipv4/fib_multipath_hash_fields /proc/sys/net/ipv6/fib_multipath_hash_fields If I run modprobe sysctl:fib_multipath_hash_fields Which one will be loaded, net/ipv4/sysctl_net_ipv4.c or ipv6/sysctl_net_ipv6.c ? Of course, IPv4 is always built-in, so ipv6.ko will be loaded in this case. But, let's think. The basename is not enough to identify which code resulted in that sysctl property. The PCI vendor/device ID is meant to be unique. That's the difference. You may argue the full path is globally unique, so modprobe sysctl:net/ipv6/fib_multipath_hash_fields should work, but that may not be so feasible to implement because not all file paths are static. On my machine: # find /proc/sys -name forwarding /proc/sys/net/ipv4/conf/all/forwarding /proc/sys/net/ipv4/conf/br-22440b7735e7/forwarding /proc/sys/net/ipv4/conf/br-3e8284a56053/forwarding /proc/sys/net/ipv4/conf/br-9b27f0f9e130/forwarding /proc/sys/net/ipv4/conf/br-bc5fbfa838fc/forwarding /proc/sys/net/ipv4/conf/br-ca51e25e8af8/forwarding /proc/sys/net/ipv4/conf/default/forwarding /proc/sys/net/ipv4/conf/docker0/forwarding /proc/sys/net/ipv4/conf/lo/forwarding /proc/sys/net/ipv4/conf/lxcbr0/forwarding /proc/sys/net/ipv4/conf/veth6e3e4b8/forwarding /proc/sys/net/ipv4/conf/virbr0/forwarding /proc/sys/net/ipv4/conf/vpn0/forwarding /proc/sys/net/ipv4/conf/wlp0s20f3/forwarding /proc/sys/net/ipv6/conf/all/forwarding /proc/sys/net/ipv6/conf/br-22440b7735e7/forwarding /proc/sys/net/ipv6/conf/br-3e8284a56053/forwarding /proc/sys/net/ipv6/conf/br-9b27f0f9e130/forwarding /proc/sys/net/ipv6/conf/br-bc5fbfa838fc/forwarding /proc/sys/net/ipv6/conf/br-ca51e25e8af8/forwarding /proc/sys/net/ipv6/conf/default/forwarding /proc/sys/net/ipv6/conf/docker0/forwarding /proc/sys/net/ipv6/conf/lo/forwarding /proc/sys/net/ipv6/conf/lxcbr0/forwarding /proc/sys/net/ipv6/conf/veth6e3e4b8/forwarding /proc/sys/net/ipv6/conf/virbr0/forwarding /proc/sys/net/ipv6/conf/vpn0/forwarding /proc/sys/net/ipv6/conf/wlp0s20f3/forwarding I do not know how to do it correctly. > > I plan to test other archs by cross-building 'allmodconfig' and checking > the .mod.c files and modpost output (eg, warnings) for no changes at all, > and nf_conntrack.mod.c for expected sysctl aliases. [based on feedback.] > (i.e., changes didn't break modpost, and ELF code works on other archs.) > > Happy to receive suggestions to improve test coverage and functionality. > > I didn't look much at auto-registration with modpost using the register > functions for sysctl, but it seems it would need plumbing, if possible. > > Let's see review/feedback on the basics first. > > thanks, > Mauricio > > ... > > Some context. > > Even though that issue might be expected and obvious, its consequences > sometimes are not. > > An example is the nf_conntrack_max value, that in busy gateways/routers > /cloud deployments can affect performance and functionality more subtly, > or even fill the kernel log non-stop with 'table full, dropping packet', > if a value greater than the default value is not used. > > The current solution (workaround, arguably) for this is to include such > modules in /etc/modules (or in /etc/modules-load.d/*.conf with systemd), > which loads them before an userspace tool (procps's sysctl or systemd's > systemd-sysctl{,.service}) runs, so /proc/sys/... exists when it runs. > > ... > > That is simple, indeed, but comes w/ technical debt. (ugly stuff warning!) > > Now there are many _different_ pieces of code that use the _same_ module > doing that (eg, deployment tools/scripts for openstack nova and neutron, > firewalls, and maybe more). > > And sometimes when components are split or deployed to different nodes > it turns out that in the next reboot we figure (through an issue) that > some component did set /etc/sysctl.conf but not /etc/modules.conf, or > relied in the ex-colocated component doing that. > > This has generated several one-off fixes at this point in some projects. > (I have submitted one of those, actually, a while ago.) > > Also, some of those fixes (or original code) put 'nf_conntrack_ipv{4,6}' > in /etc/modules, getting 'nf_conntrack' loaded via module dependencies > (maybe it was the right module for them at the time, for some reason). > > So, that component (or a colocated component) got nf_conntrack.ko too. > > *BUT* after an upgrade from Ubuntu 18.04 (4.15-based kernel) to 20.04 > (5.4-based kernel), the nf_conntrack_ipv{4,6}.ko modules do not exist > anymore, and now nf_conntrack.ko is no longer loaded, and the sysctl > nf_conntrack_max is no longer applied. (Someone had to figure it out.) > > And now maybe we'd need release/kernel-version checks in scripts that > use the workaround of /etc/modules for /etc/sysctl.conf configuration. > > (Yes, it was ugly stuff.) > > ... > > Well, this last point seemed like "ok, that's enough; we can do better." > > I'm not sure this approach is "better" in all reasons, but hopefully it > might help starting something that is. 🙏 > > cheers, > Mauricio > > ... > > Tests: > > $ cat /proc/sys/kernel/modprobe_sysctl_alias > 1 > > $ cat /proc/sys/net/netfilter/nf_conntrack_max > cat: /proc/sys/net/netfilter/nf_conntrack_max: No such file or directory > > $ lsmod | grep nf_conntrack > $ > > $ sudo modprobe sysctl:nf_conntrack_max > > $ cat /proc/sys/net/netfilter/nf_conntrack_max > 262144 > > $ lsmod | grep nf_conntrack > nf_conntrack 110592 0 > nf_defrag_ipv6 20480 1 nf_conntrack > nf_defrag_ipv4 16384 1 nf_conntrack > > $ modinfo nf_conntrack | grep ^alias: > alias: nf_conntrack-10 > alias: nf_conntrack-2 > alias: ip_conntrack > alias: sysctl:nf_conntrack_icmpv6_timeout > alias: sysctl:nf_conntrack_icmp_timeout > alias: sysctl:nf_conntrack_udp_timeout_stream > alias: sysctl:nf_conntrack_udp_timeout > alias: sysctl:nf_conntrack_tcp_max_retrans > alias: sysctl:nf_conntrack_tcp_ignore_invalid_rst > alias: sysctl:nf_conntrack_tcp_be_liberal > alias: sysctl:nf_conntrack_tcp_loose > alias: sysctl:nf_conntrack_tcp_timeout_unacknowledged > alias: sysctl:nf_conntrack_tcp_timeout_max_retrans > alias: sysctl:nf_conntrack_tcp_timeout_close > alias: sysctl:nf_conntrack_tcp_timeout_time_wait > alias: sysctl:nf_conntrack_tcp_timeout_last_ack > alias: sysctl:nf_conntrack_tcp_timeout_close_wait > alias: sysctl:nf_conntrack_tcp_timeout_fin_wait > alias: sysctl:nf_conntrack_tcp_timeout_established > alias: sysctl:nf_conntrack_tcp_timeout_syn_recv > alias: sysctl:nf_conntrack_tcp_timeout_syn_sent > alias: sysctl:nf_conntrack_generic_timeout > alias: sysctl:nf_conntrack_helper > alias: sysctl:nf_conntrack_acct > alias: sysctl:nf_conntrack_expect_max > alias: sysctl:nf_conntrack_log_invalid > alias: sysctl:nf_conntrack_checksum > alias: sysctl:nf_conntrack_buckets > alias: sysctl:nf_conntrack_count > alias: sysctl:nf_conntrack_max > > $ modinfo r8169 | grep ^alias: > alias: pci:v000010ECd00003000sv*sd*bc*sc*i* > alias: pci:v000010ECd00008125sv*sd*bc*sc*i* > alias: pci:v00000001d00008168sv*sd00002410bc*sc*i* > alias: pci:v00001737d00001032sv*sd00000024bc*sc*i* > alias: pci:v000016ECd00000116sv*sd*bc*sc*i* > alias: pci:v00001259d0000C107sv*sd*bc*sc*i* > alias: pci:v00001186d00004302sv*sd*bc*sc*i* > alias: pci:v00001186d00004300sv*sd*bc*sc*i* > alias: pci:v00001186d00004300sv00001186sd00004B10bc*sc*i* > alias: pci:v000010ECd00008169sv*sd*bc*sc*i* > alias: pci:v000010FFd00008168sv*sd*bc*sc*i* > alias: pci:v000010ECd00008168sv*sd*bc*sc*i* > alias: pci:v000010ECd00008167sv*sd*bc*sc*i* > alias: pci:v000010ECd00008162sv*sd*bc*sc*i* > alias: pci:v000010ECd00008161sv*sd*bc*sc*i* > alias: pci:v000010ECd00008136sv*sd*bc*sc*i* > alias: pci:v000010ECd00008129sv*sd*bc*sc*i* > alias: pci:v000010ECd00002600sv*sd*bc*sc*i* > alias: pci:v000010ECd00002502sv*sd*bc*sc*i* > > Mauricio Faria de Oliveira (6): > modpost: factor out elf/arch-specific code from section_rel[a]() > modpost: deduplicate section_rel[a]() > sysctl, mod_devicetable: shadow struct ctl_table.procname for > file2alias > module, modpost: introduce support for MODULE_SYSCTL_TABLE > netfilter: conntrack: use MODULE_SYSCTL_TABLE > sysctl: introduce /proc/sys/kernel/modprobe_sysctl_alias > > fs/proc/proc_sysctl.c | 27 ++++ > include/linux/mod_devicetable.h | 25 ++++ > include/linux/module.h | 8 ++ > include/linux/sysctl.h | 11 +- > kernel/sysctl.c | 10 ++ > net/netfilter/nf_conntrack_standalone.c | 4 + > scripts/mod/devicetable-offsets.c | 3 + > scripts/mod/file2alias.c | 111 +++++++++++++++ > scripts/mod/modpost.c | 178 +++++++++++++----------- > scripts/mod/modpost.h | 3 + > 10 files changed, 296 insertions(+), 84 deletions(-) > > -- > 2.25.1 > -- Best Regards Masahiro Yamada