Null pointer dereference in nsfs_evict with CONFIG_NET_NS=n triggered via systemd-networkd's debugging

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,
I'm getting the following oops on 5.9.3 (and 5.9.1, and 5.6.7, all with some unrelated patches, see [1]). In this crash, nsfs_evict() gets called with ns->ops being NULL.

[    6.947411] 8<--- cut here ---
[ 6.950502] Unable to handle kernel NULL pointer dereference at virtual address 00000010
[    6.958685] pgd = da1de5c3
[    6.961417] [00000010] *pgd=3fcd2831
[    6.965047] Internal error: Oops: 17 [#1] SMP ARM
[ 6.969781] CPU: 0 PID: 199 Comm: systemd-network Not tainted 5.9.1-cla-cfb #1
[    6.977033] Hardware name: Marvell Armada 380/385 (Device Tree)
[    6.982991] PC is at nsfs_evict+0x18/0x20
[    6.987029] LR is at evict+0xac/0x188
[    6.990716] pc : [<c029aa84>]    lr : [<c027d40c>]    psr: 60010013
[    6.997009] sp : ecdefed0  ip : 00000001  fp : 00000000
[    7.002258] r10: c0c03e8c  r9 : 5ac3c35a  r8 : ef036910
[    7.007508] r7 : ed2d4880  r6 : c090a5c0  r5 : ed23c910  r4 : ed23c858
[    7.014064] r3 : 00000000  r2 : ed23c918  r1 : 00000000  r0 : c0c60190
[ 7.020621] Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none
[    7.027787] Control: 10c5387d  Table: 2cc4404a  DAC: 00000051
[    7.033566] Process systemd-network (pid: 199, stack limit = 0x7d1d3b46)
[    7.040299] Stack: (0xecdefed0 to 0xecdf0000)
[ 7.044684] fec0: ed2d4880 00000000 ed2d48d0 c0278804 [ 7.052901] fee0: ed7bf0c0 0008801d ed23c858 c02615ec 00000000 ed23c858 00000000 c0265d1c [ 7.061117] ff00: 000007ff 00000000 ed6930ec ed692cc0 c0c73ee4 00000454 5ac3c35a c0140350 [ 7.069339] ff20: ecdee000 ecdeffb0 c0100264 fffffe30 c0100264 c010a7b0 ed26c200 be8b0860 [ 7.077572] ff40: 00004000 00000128 c0100264 c069b5d4 00000000 00000000 00000000 000000fe [ 7.085798] ff60: 00000000 00000000 00000000 c01401f8 00000000 c0c03e88 ed7bf0c0 ed7bf0c0 [ 7.094012] ff80: 00000000 c0c03e88 ed7bf0c0 0000000b b6f794d0 01776af0 00000006 c0100264 [ 7.102233] ffa0: ecdee000 00000006 00000000 c01000cc 00000000 be8b0860 00000000 00000000 [ 7.110457] ffc0: 0000000b b6f794d0 01776af0 00000006 0000000b 0177445c b6f80000 00000000 [ 7.118673] ffe0: b6f4b10c be8b1a40 b6e1e490 b6d1c320 60010010 0000000b 00000000 00000000
[    7.126898] [<c029aa84>] (nsfs_evict) from [<00000000>] (0x0)
[ 7.132676] Code: ebff8a17 e1a00004 e5943004 e8bd4010 (e5933010) [ 7.138841] ---[ end trace 2b44d591054a9910 ]---
[    7.143482] Kernel panic - not syncing: Fatal exception
[    7.148733] CPU1: stopping
[ 7.151455] CPU: 1 PID: 331 Comm: bash Tainted: G D 5.9.1-cla-cfb #1
[    7.159133] Hardware name: Marvell Armada 380/385 (Device Tree)
[ 7.165080] [<c010f10c>] (unwind_backtrace) from [<c010add8>] (show_stack+0x10/0x14) [ 7.172849] [<c010add8>] (show_stack) from [<c07eccac>] (dump_stack+0x94/0xa8) [ 7.180095] [<c07eccac>] (dump_stack) from [<c010dda8>] (handle_IPI+0x340/0x378) [ 7.187516] [<c010dda8>] (handle_IPI) from [<c0430e34>] (gic_handle_irq+0x8c/0x90) [ 7.195110] [<c0430e34>] (gic_handle_irq) from [<c0100b0c>] (__irq_svc+0x6c/0x90)
[    7.202612] Exception stack(0xecedbe28 to 0xecedbe70)
[ 7.207678] be20: edb83d90 edb72bc8 c022a3d8 0000015f edb72bc8 00000000 [ 7.215879] be40: 0013f000 edb72ba0 edb72ba0 ed7774bc c0cc0e20 ed777480 00021000 ecedbe78
[    7.224079] be60: ed61d4e0 c022ac48 a00d0013 ffffffff
[ 7.229148] [<c0100b0c>] (__irq_svc) from [<c022ac48>] (anon_vma_interval_tree_remove+0x1dc/0x2d4) [ 7.238135] [<c022ac48>] (anon_vma_interval_tree_remove) from [<c023efe4>] (unlink_anon_vmas+0xbc/0x1fc) [ 7.247644] [<c023efe4>] (unlink_anon_vmas) from [<c022f6f4>] (free_pgtables+0x48/0xb4) [ 7.255674] [<c022f6f4>] (free_pgtables) from [<c0238dd8>] (exit_mmap+0xe8/0x1b4)
[    7.263183] [<c0238dd8>] (exit_mmap) from [<c011c43c>] (mmput+0x48/0xec)
[    7.269905] [<c011c43c>] (mmput) from [<c0124430>] (do_exit+0x2d4/0x930)
[ 7.276625] [<c0124430>] (do_exit) from [<c0124af4>] (do_group_exit+0x3c/0xb8) [ 7.283868] [<c0124af4>] (do_group_exit) from [<c0124b80>] (__wake_up_parent+0x0/0x18)
[    7.291815] Rebooting in 10 seconds..

Vlastimil Babka helped me debug this (thanks a lot!), and the ns->ops is supposed to be set via net_ns_net_init(). That code, however, only initializes this ops structure when CONFIG_NET_NS=y, and I have CONFIG_NET_NS=n.

On how to reproduce, this is where the fun starts. I'm getting this on an ARM board (mvebu, SolidRun Clearfog Base). It started happening after updating userland from systemd-243.4 to systemd-246.6 (and a ton of unrelated bits including the toolchain -- you know, embedded updates). However, it *only* happens when that new enough systemd-networkd is launched with SYSTEMD_LOG_LEVEL=debug, and indeed, here's what a relevant part of the diff of the updated systemd looks like (in particular systemd commit f6dbcebdc28cabf36e6665b67d52d43192fb88df):

@@ -164,12 +158,54 @@ int device_monitor_new_full(sd_device_monitor **ret, MonitorNetlinkGroup group,

        if (fd >= 0) {
                r = monitor_set_nl_address(m);
-                if (r < 0)
- return log_debug_errno(r, "sd-device-monitor: Failed to set netlink address: %m");
+                if (r < 0) {
+ log_debug_errno(r, "sd-device-monitor: Failed to set netlink address: %m");
+                        goto fail;
+                }
+        }
+
+        if (DEBUG_LOGGING) {
+                _cleanup_close_ int netns = -1;
+
+ /* So here's the thing: only AF_NETLINK sockets from the main network namespace will get + * hardware events. Let's check if ours is from there, and if not generate a debug message, + * since we cannot possibly work correctly otherwise. This is just a safety check to make
+                 * things easier to debug. */
+
+                netns = ioctl(m->sock, SIOCGSKNS);
+                if (netns < 0)
+ log_debug_errno(errno, "sd-device-monitor: Unable to get network namespace of udev netlink socket, unable to determine if we are in host netns: %m");
+                else {
+                        struct stat a, b;
+
+                        if (fstat(netns, &a) < 0) {
+ r = log_debug_errno(errno, "sd-device-monitor: Failed to stat netns of udev netlink socket: %m");
+                                goto fail;
+                        }
+
+                        if (stat("/proc/1/ns/net", &b) < 0) {
+                                if (ERRNO_IS_PRIVILEGE(errno))
+ /* If we can't access PID1's netns info due to permissions, it's fine, this is a
+                                         * safety check only after all. */
+ log_debug_errno(errno, "sd-device-monitor: No permission to stat PID1's netns, unable to determine if we are in host netns: %m");
+                                else
+ log_debug_errno(errno, "sd-device-monitor: Failed to stat PID1's netns: %m");
+
+ } else if (a.st_dev != b.st_dev || a.st_ino != b.st_ino) + log_debug("sd-device-monitor: Netlink socket we listen on is not from host netns, we won't see device events.");
+                }
        }

Apparently, when debugging is enabled, something stats /proc/1/ns/net, quite likely from a sandboxed/namespaced/whatever process context, and that something was not happening on the previous version of systemd.

Anyway, I'm so happy I can finally reproduce this "mysterious crash" on a box with a remote console, so please feel free to ask for extra details if needed. I'll also be happy to try patches, etc. Perhaps Lennart has a reproducer that's small enough? Something simple as `ls -al` from a SSH session is not enough.

With kind regards,
Jan

[1] https://gerrit.cesnet.cz/plugins/gitiles/github/torvalds/linux/+log/refs/heads/cesnet/2020-11-03---5.9.3




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux