Re: [RFC PATCH 0/6] Introduce "sysctl:" module aliases

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Jul 22, 2022 at 11:24 AM Mauricio Faria de Oliveira
<mfo@xxxxxxxxxxxxx> wrote:
>
> This series allows modules to have "sysctl:<procname>" module aliases
> for sysctl entries, registering sysctl tables with modpost/file2alias.
> (Similarly to "pci:<IDs>" aliases for PCI ID tables in device drivers.)
>
> The issue behind it: if a sysctl value is in /etc/sysctl.{conf,d/*.conf}
> but does not exist in /proc/sys/ when the userspace tool that applies it
> runs, it does not get set.
>
> It would be nice if the tool could run 'modprobe sysctl:<something>' and
> get that '/proc/sys/.../something' up (as an administrator configured it)
> and then set it, as intended. (A bit like PCI ID-based module loading.)
>
> ...
>
> The series is relatively simple, except for patch 4 (IMHO) due to ELF.
>
> - Patches 1-2 simplify ELF code in modpost.c (code moves, not in-depth).
> - Patches 3-4 implement the feature (patch 4 is more in-depth).
> - Patches 5-6 consume and expose it.
>
> I have tested it on x86_64 with next-20220721, and it looks correct
> ('modprobe sysctl:nf_conntrack_max' works; other aliases there; see below).


I did not test this patch set at all, but I am afraid
you took a good case as an example.



I see two locations for the "fib_multipath_hash_fields" parameter
for example.

#  find /proc/sys/ -name fib_multipath_hash_fields
/proc/sys/net/ipv4/fib_multipath_hash_fields
/proc/sys/net/ipv6/fib_multipath_hash_fields


If I run

   modprobe sysctl:fib_multipath_hash_fields

Which one will be loaded, net/ipv4/sysctl_net_ipv4.c
or ipv6/sysctl_net_ipv6.c ?

Of course, IPv4 is always built-in, so ipv6.ko will be loaded in this case.
But, let's think. The basename is not enough to identify
which code resulted in that sysctl property.
The PCI vendor/device ID is meant to be unique. That's the difference.


You may argue the full path is globally unique, so

  modprobe  sysctl:net/ipv6/fib_multipath_hash_fields

should work, but that may not be so feasible to implement
because not all file paths are static.


On my machine:

# find  /proc/sys  -name  forwarding
/proc/sys/net/ipv4/conf/all/forwarding
/proc/sys/net/ipv4/conf/br-22440b7735e7/forwarding
/proc/sys/net/ipv4/conf/br-3e8284a56053/forwarding
/proc/sys/net/ipv4/conf/br-9b27f0f9e130/forwarding
/proc/sys/net/ipv4/conf/br-bc5fbfa838fc/forwarding
/proc/sys/net/ipv4/conf/br-ca51e25e8af8/forwarding
/proc/sys/net/ipv4/conf/default/forwarding
/proc/sys/net/ipv4/conf/docker0/forwarding
/proc/sys/net/ipv4/conf/lo/forwarding
/proc/sys/net/ipv4/conf/lxcbr0/forwarding
/proc/sys/net/ipv4/conf/veth6e3e4b8/forwarding
/proc/sys/net/ipv4/conf/virbr0/forwarding
/proc/sys/net/ipv4/conf/vpn0/forwarding
/proc/sys/net/ipv4/conf/wlp0s20f3/forwarding
/proc/sys/net/ipv6/conf/all/forwarding
/proc/sys/net/ipv6/conf/br-22440b7735e7/forwarding
/proc/sys/net/ipv6/conf/br-3e8284a56053/forwarding
/proc/sys/net/ipv6/conf/br-9b27f0f9e130/forwarding
/proc/sys/net/ipv6/conf/br-bc5fbfa838fc/forwarding
/proc/sys/net/ipv6/conf/br-ca51e25e8af8/forwarding
/proc/sys/net/ipv6/conf/default/forwarding
/proc/sys/net/ipv6/conf/docker0/forwarding
/proc/sys/net/ipv6/conf/lo/forwarding
/proc/sys/net/ipv6/conf/lxcbr0/forwarding
/proc/sys/net/ipv6/conf/veth6e3e4b8/forwarding
/proc/sys/net/ipv6/conf/virbr0/forwarding
/proc/sys/net/ipv6/conf/vpn0/forwarding
/proc/sys/net/ipv6/conf/wlp0s20f3/forwarding


I do not know how to do it correctly.




>
> I plan to test other archs by cross-building 'allmodconfig' and checking
> the .mod.c files and modpost output (eg, warnings) for no changes at all,
> and nf_conntrack.mod.c for expected sysctl aliases. [based on feedback.]
> (i.e., changes didn't break modpost, and ELF code works on other archs.)
>
> Happy to receive suggestions to improve test coverage and functionality.
>
> I didn't look much at auto-registration with modpost using the register
> functions for sysctl, but it seems it would need plumbing, if possible.
>
> Let's see review/feedback on the basics first.
>
> thanks,
> Mauricio
>
> ...
>
> Some context.
>
> Even though that issue might be expected and obvious, its consequences
> sometimes are not.
>
> An example is the nf_conntrack_max value, that in busy gateways/routers
> /cloud deployments can affect performance and functionality more subtly,
> or even fill the kernel log non-stop with 'table full, dropping packet',
> if a value greater than the default value is not used.
>
> The current solution (workaround, arguably) for this is to include such
> modules in /etc/modules (or in /etc/modules-load.d/*.conf with systemd),
> which loads them before an userspace tool (procps's sysctl or systemd's
> systemd-sysctl{,.service}) runs, so /proc/sys/... exists when it runs.
>
> ...
>
> That is simple, indeed, but comes w/ technical debt. (ugly stuff warning!)
>
> Now there are many _different_ pieces of code that use the _same_ module
> doing that (eg, deployment tools/scripts for openstack nova and neutron,
> firewalls, and maybe more).
>
> And sometimes when components are split or deployed to different nodes
> it turns out that in the next reboot we figure (through an issue) that
> some component did set /etc/sysctl.conf but not /etc/modules.conf, or
> relied in the ex-colocated component doing that.
>
> This has generated several one-off fixes at this point in some projects.
> (I have submitted one of those, actually, a while ago.)
>
> Also, some of those fixes (or original code) put 'nf_conntrack_ipv{4,6}'
> in /etc/modules, getting 'nf_conntrack' loaded via module dependencies
> (maybe it was the right module for them at the time, for some reason).
>
> So, that component (or a colocated component) got nf_conntrack.ko too.
>
> *BUT* after an upgrade from Ubuntu 18.04 (4.15-based kernel) to 20.04
> (5.4-based kernel), the nf_conntrack_ipv{4,6}.ko modules do not exist
> anymore, and now nf_conntrack.ko is no longer loaded, and the sysctl
> nf_conntrack_max is no longer applied. (Someone had to figure it out.)
>
> And now maybe we'd need release/kernel-version checks in scripts that
> use the workaround of /etc/modules for /etc/sysctl.conf configuration.
>
> (Yes, it was ugly stuff.)
>
> ...
>
> Well, this last point seemed like "ok, that's enough; we can do better."
>
> I'm not sure this approach is "better" in all reasons, but hopefully it
> might help starting something that is. 🙏
>
> cheers,
> Mauricio
>
> ...
>
> Tests:
>
>     $ cat /proc/sys/kernel/modprobe_sysctl_alias
>     1
>
>     $ cat /proc/sys/net/netfilter/nf_conntrack_max
>     cat: /proc/sys/net/netfilter/nf_conntrack_max: No such file or directory
>
>     $ lsmod | grep nf_conntrack
>     $
>
>     $ sudo modprobe sysctl:nf_conntrack_max
>
>     $ cat /proc/sys/net/netfilter/nf_conntrack_max
>     262144
>
>     $ lsmod | grep nf_conntrack
>     nf_conntrack          110592  0
>     nf_defrag_ipv6         20480  1 nf_conntrack
>     nf_defrag_ipv4         16384  1 nf_conntrack
>
>     $ modinfo nf_conntrack | grep ^alias:
>     alias:          nf_conntrack-10
>     alias:          nf_conntrack-2
>     alias:          ip_conntrack
>     alias:          sysctl:nf_conntrack_icmpv6_timeout
>     alias:          sysctl:nf_conntrack_icmp_timeout
>     alias:          sysctl:nf_conntrack_udp_timeout_stream
>     alias:          sysctl:nf_conntrack_udp_timeout
>     alias:          sysctl:nf_conntrack_tcp_max_retrans
>     alias:          sysctl:nf_conntrack_tcp_ignore_invalid_rst
>     alias:          sysctl:nf_conntrack_tcp_be_liberal
>     alias:          sysctl:nf_conntrack_tcp_loose
>     alias:          sysctl:nf_conntrack_tcp_timeout_unacknowledged
>     alias:          sysctl:nf_conntrack_tcp_timeout_max_retrans
>     alias:          sysctl:nf_conntrack_tcp_timeout_close
>     alias:          sysctl:nf_conntrack_tcp_timeout_time_wait
>     alias:          sysctl:nf_conntrack_tcp_timeout_last_ack
>     alias:          sysctl:nf_conntrack_tcp_timeout_close_wait
>     alias:          sysctl:nf_conntrack_tcp_timeout_fin_wait
>     alias:          sysctl:nf_conntrack_tcp_timeout_established
>     alias:          sysctl:nf_conntrack_tcp_timeout_syn_recv
>     alias:          sysctl:nf_conntrack_tcp_timeout_syn_sent
>     alias:          sysctl:nf_conntrack_generic_timeout
>     alias:          sysctl:nf_conntrack_helper
>     alias:          sysctl:nf_conntrack_acct
>     alias:          sysctl:nf_conntrack_expect_max
>     alias:          sysctl:nf_conntrack_log_invalid
>     alias:          sysctl:nf_conntrack_checksum
>     alias:          sysctl:nf_conntrack_buckets
>     alias:          sysctl:nf_conntrack_count
>     alias:          sysctl:nf_conntrack_max
>
>     $ modinfo r8169 | grep ^alias:
>     alias:          pci:v000010ECd00003000sv*sd*bc*sc*i*
>     alias:          pci:v000010ECd00008125sv*sd*bc*sc*i*
>     alias:          pci:v00000001d00008168sv*sd00002410bc*sc*i*
>     alias:          pci:v00001737d00001032sv*sd00000024bc*sc*i*
>     alias:          pci:v000016ECd00000116sv*sd*bc*sc*i*
>     alias:          pci:v00001259d0000C107sv*sd*bc*sc*i*
>     alias:          pci:v00001186d00004302sv*sd*bc*sc*i*
>     alias:          pci:v00001186d00004300sv*sd*bc*sc*i*
>     alias:          pci:v00001186d00004300sv00001186sd00004B10bc*sc*i*
>     alias:          pci:v000010ECd00008169sv*sd*bc*sc*i*
>     alias:          pci:v000010FFd00008168sv*sd*bc*sc*i*
>     alias:          pci:v000010ECd00008168sv*sd*bc*sc*i*
>     alias:          pci:v000010ECd00008167sv*sd*bc*sc*i*
>     alias:          pci:v000010ECd00008162sv*sd*bc*sc*i*
>     alias:          pci:v000010ECd00008161sv*sd*bc*sc*i*
>     alias:          pci:v000010ECd00008136sv*sd*bc*sc*i*
>     alias:          pci:v000010ECd00008129sv*sd*bc*sc*i*
>     alias:          pci:v000010ECd00002600sv*sd*bc*sc*i*
>     alias:          pci:v000010ECd00002502sv*sd*bc*sc*i*
>
> Mauricio Faria de Oliveira (6):
>   modpost: factor out elf/arch-specific code from section_rel[a]()
>   modpost: deduplicate section_rel[a]()
>   sysctl, mod_devicetable: shadow struct ctl_table.procname for
>     file2alias
>   module, modpost: introduce support for MODULE_SYSCTL_TABLE
>   netfilter: conntrack: use MODULE_SYSCTL_TABLE
>   sysctl: introduce /proc/sys/kernel/modprobe_sysctl_alias
>
>  fs/proc/proc_sysctl.c                   |  27 ++++
>  include/linux/mod_devicetable.h         |  25 ++++
>  include/linux/module.h                  |   8 ++
>  include/linux/sysctl.h                  |  11 +-
>  kernel/sysctl.c                         |  10 ++
>  net/netfilter/nf_conntrack_standalone.c |   4 +
>  scripts/mod/devicetable-offsets.c       |   3 +
>  scripts/mod/file2alias.c                | 111 +++++++++++++++
>  scripts/mod/modpost.c                   | 178 +++++++++++++-----------
>  scripts/mod/modpost.h                   |   3 +
>  10 files changed, 296 insertions(+), 84 deletions(-)
>
> --
> 2.25.1
>


--
Best Regards
Masahiro Yamada




[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Big List of Linux Books]

  Powered by Linux