Nice shot! Thank you very much. — Stéphane -----Message d'origine----- De : Oliver Hartkopp <socketcan@xxxxxxxxxxxx> Envoyé : vendredi 17 juillet 2020 13:32 À : Stéphane Grosjean <s.grosjean@xxxxxxxxxxxxxxx>; Philipp Lehmann <leph1016@xxxxxxxxxxxxxxx>; wg@xxxxxxxxxxxxxx; mkl@xxxxxxxxxxxxxx Cc : linux-can@xxxxxxxxxxxxxxx; christian.sauer.w@xxxxxxxxxxxxxxxx Objet : Re: [Bug] Kernel Panic on Deletion of the network-namespace containing the SocketCAN interface I found it! diff --git a/net/core/dev.c b/net/core/dev.c index 90b59fc50dc9..add15461a9e2 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -10517,7 +10517,7 @@ static void __net_exit default_device_exit(struct net *net) continue; /* Leave virtual devices for the generic cleanup */ - if (dev->rtnl_link_ops) + if ((dev->rtnl_link_ops) && (dev->type != ARPHRD_CAN)) continue; /* Push remaining network devices to init_net */ The problem is, that 'real' CAN interfaces use the rtnl_link_ops to configure bitrates and other CAN controller specific settings. But the fact that rtnl_link_ops are available makes somebody thinking this is only a virtual interface - and therefore the transition of the interface back to the root namespace is skipped. The patch above fixes the issue but I'm not sure if we need a more general solution here. Best, Oliver On 17.07.20 13:02, Oliver Hartkopp wrote: > > > On 17.07.20 09:55, Stéphane Grosjean wrote: >> There's a first WARNING kernel message just when the namespace is >> deleted. The WARNING turns into a BUG (kernel NULL pointer >> dereference) when removing the interface itself (for example, when >> the driver module is removed from memory). Note that the issue occurs >> with all our internal as well as USB CAN interfaces. >> >> Obviously, the problem doesn't appear when you put the interface back >> in the root namespace before the destruction, or when the interface >> is a true Ethernet network interface. > > Yes. I checked that with an USB Ethernet interface I have at hand here > - and when deleting the test namespace it just emerges in the root > namespace again. > > So I wonder what's missing in our configuration for CAN interfaces > that this transition is not performed :-/ > > Best regards, > Oliver > > >> Context: >> >> $ uname -a >> Linux linux-dev 5.4.0-39-generic #43-Ubuntu SMP Fri Jun 19 10:28:31 >> UTC 2020 x86_64 x86_64 x86_64 GNU/Linux >> >> $ dmesg | grep peak_pci >> [ 19.028048] peak_pci 0000:0a:00.0: enabling device (0100 -> 0102) >> [ 19.034283] peak_pci 0000:0a:00.0: can6 at >> reg_base=0x00000000136b8b0b cfg_base=0x00000000b826597c irq=27 [ >> 19.034378] peak_pci 0000:0a:00.0: can7 at >> reg_base=0x000000006b3de9a0 cfg_base=0x00000000b826597c irq=27 >> >> # ip netns add test >> # ip link set dev can6 netns test >> # ip netns delete test >> >> [ 1755.805241] ------------[ cut here ]------------ [ 1755.805251] >> WARNING: CPU: 8 PID: 2635 at net/core/dev.c:10039 >> netdev_exit+0x44/0x50 >> [ 1755.805252] Modules linked in: vboxnetadp(OE) vboxnetflt(OE) >> vboxdrv(OE) md4 nls_utf8 cifs libarc4 fscache libdes cfg80211 >> nls_iso8859_1 intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal >> intel_powerclamp coretemp kvm_intel peak_usb 8812au(OE) joydev >> plin(OE) kvm input_leds pcan(OE) eeepc_wmi snd_hda_codec_hdmi >> snd_hda_codec_realtek nouveau pcmcia snd_hda_codec_generic asus_wmi >> pcmcia_core crct10dif_pclmul ghash_clmulni_intel peak_pci sja1000 >> peak_pciefd ledtrig_audio ttm aesni_intel drm_kms_helper >> snd_hda_intel snd_intel_dspcfg fb_sys_fops syscopyarea sysfillrect >> sysimgblt can_dev snd_hda_codec snd_hda_core crypto_simd snd_hwdep >> cryptd snd_pcm glue_helper snd_seq_midi snd_seq_midi_event >> snd_rawmidi snd_seq snd_seq_device snd_timer sparse_keymap >> intel_cstate intel_rapl_perf snd video mei_me mei soundcore wmi_bmof >> intel_wmi_thunderbolt mxm_wmi mac_hid sch_fq_codel parport_pc ppdev >> lp parport drm ip_tables x_tables autofs4 hid_generic usbhid uas >> usb_storage hid crc32_pclmul igb e1000e [ 1755.805306] ahci i2c_i801 >> i2c_algo_bit lpc_ich libahci dca wmi [ 1755.805315] CPU: 8 PID: 2635 >> Comm: kworker/u24:0 Tainted: >> G OE 5.4.0-39-generic #43-Ubuntu [ 1755.805316] >> Hardware name: ASUS All Series/X99-E WS, BIOS 4001 >> 05/27/2019 >> [ 1755.805319] Workqueue: netns cleanup_net [ 1755.805324] RIP: >> 0010:netdev_exit+0x44/0x50 [ 1755.805327] Code: 8b bb 30 01 00 00 e8 >> 8b 9d 97 ff 48 81 fb 00 21 be b5 74 13 48 8b 83 90 00 00 00 48 81 c3 >> 90 00 00 00 48 39 c3 75 03 5b 5d c3 <0f> 0b eb f9 0f 1f 84 00 00 00 >> 00 00 0f 1f 44 00 00 55 48 89 >> e5 41 >> [ 1755.805329] RSP: 0018:ffffb12d0169fdc8 EFLAGS: 00010287 [ >> 1755.805331] RAX: ffff9624c8a70050 RBX: ffff96247642a710 RCX: >> 000000008010000b >> [ 1755.805333] RDX: 000000008010000c RSI: 0000000000000001 RDI: >> ffff9624cdc06a00 >> [ 1755.805334] RBP: ffffb12d0169fdd0 R08: 0000000000000000 R09: >> ffffffffb4d2e300 >> [ 1755.805335] R10: ffff9624a0d84000 R11: 0000000000000001 R12: >> ffffb12d0169fe20 >> [ 1755.805337] R13: ffffffffb5be3f20 R14: ffffffffb5be3f28 R15: >> ffff962498ea39d8 >> [ 1755.805339] FS: 0000000000000000(0000) GS:ffff9624cfa00000(0000) >> knlGS:0000000000000000 >> [ 1755.805341] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ >> 1755.805342] CR2: 000055fbb59cd8e8 CR3: 000000004aa0a006 CR4: >> 00000000001606e0 >> [ 1755.805344] Call Trace: >> [ 1755.805350] ops_exit_list.isra.0+0x3b/0x70 [ 1755.805353] >> cleanup_net+0x1f0/0x300 [ 1755.805359] process_one_work+0x1eb/0x3b0 >> [ 1755.805363] worker_thread+0x4d/0x400 [ 1755.805367] >> kthread+0x104/0x140 [ 1755.805370] ? process_one_work+0x3b0/0x3b0 [ >> 1755.805373] ? kthread_park+0x90/0x90 [ 1755.805378] >> ret_from_fork+0x35/0x40 [ 1755.805382] ---[ end trace >> 832a75ad96f8105e ]--- [ 1755.805410] ------------[ cut here >> ]------------ [ 1755.805416] WARNING: CPU: 8 PID: 2635 at >> fs/proc/proc_sysctl.c:1714 >> retire_sysctl_set+0x14/0x18 >> [ 1755.805417] Modules linked in: vboxnetadp(OE) vboxnetflt(OE) >> vboxdrv(OE) md4 nls_utf8 cifs libarc4 fscache libdes cfg80211 >> nls_iso8859_1 intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal >> intel_powerclamp coretemp kvm_intel peak_usb 8812au(OE) joydev >> plin(OE) kvm input_leds pcan(OE) eeepc_wmi snd_hda_codec_hdmi >> snd_hda_codec_realtek nouveau pcmcia snd_hda_codec_generic asus_wmi >> pcmcia_core crct10dif_pclmul ghash_clmulni_intel peak_pci sja1000 >> peak_pciefd ledtrig_audio ttm aesni_intel drm_kms_helper >> snd_hda_intel snd_intel_dspcfg fb_sys_fops syscopyarea sysfillrect >> sysimgblt can_dev snd_hda_codec snd_hda_core crypto_simd snd_hwdep >> cryptd snd_pcm glue_helper snd_seq_midi snd_seq_midi_event >> snd_rawmidi snd_seq snd_seq_device snd_timer sparse_keymap >> intel_cstate intel_rapl_perf snd video mei_me mei soundcore wmi_bmof >> intel_wmi_thunderbolt mxm_wmi mac_hid sch_fq_codel parport_pc ppdev >> lp parport drm ip_tables x_tables autofs4 hid_generic usbhid uas >> usb_storage hid crc32_pclmul igb e1000e [ 1755.805454] ahci i2c_i801 >> i2c_algo_bit lpc_ich libahci dca wmi >> [ 1755.805460] CPU: 8 PID: 2635 Comm: kworker/u24:0 Tainted: G >> W OE 5.4.0-39-generic #43-Ubuntu [ 1755.805461] Hardware name: >> ASUS All Series/X99-E WS, BIOS 4001 >> 05/27/2019 >> [ 1755.805463] Workqueue: netns cleanup_net [ 1755.805467] RIP: >> 0010:retire_sysctl_set+0x14/0x18 [ 1755.805469] Code: 00 00 00 00 49 >> c7 40 48 00 00 00 00 49 c7 40 50 >> 00 00 00 00 c3 90 0f 1f 44 00 00 55 48 8b 47 58 48 89 e5 48 85 c0 75 >> 02 5d c3 <0f> 0b 5d c3 0f 1f 44 00 00 55 48 89 e5 48 83 ec 60 48 89 >> 4c >> 24 48 >> [ 1755.805471] RSP: 0018:ffffb12d0169fdc0 EFLAGS: 00010282 [ >> 1755.805473] RAX: ffff9624a697ad58 RBX: ffff96247642a680 RCX: >> 0000000080150009 >> [ 1755.805475] RDX: ffff96247642a6b0 RSI: ffffffffb5bfeb48 RDI: >> ffff96247642a730 >> [ 1755.805476] RBP: ffffb12d0169fdc0 R08: 0000000000000000 R09: >> ffffffffb4776d00 >> [ 1755.805477] R10: ffff9624cad0a9c0 R11: 0000000000000001 R12: >> ffffb12d0169fe20 >> [ 1755.805479] R13: ffffffffb5bfeb40 R14: ffffffffb5bfeb48 R15: >> ffff962498ea39d8 >> [ 1755.805481] FS: 0000000000000000(0000) GS:ffff9624cfa00000(0000) >> knlGS:0000000000000000 >> [ 1755.805482] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ >> 1755.805483] CR2: 000055fbb59cd8e8 CR3: 000000004aa0a006 CR4: >> 00000000001606e0 >> [ 1755.805485] Call Trace: >> [ 1755.805490] sysctl_net_exit+0x15/0x20 [ 1755.805493] >> ops_exit_list.isra.0+0x3b/0x70 [ 1755.805496] >> cleanup_net+0x1f0/0x300 [ 1755.805500] process_one_work+0x1eb/0x3b0 >> [ 1755.805503] worker_thread+0x4d/0x400 [ 1755.805507] >> kthread+0x104/0x140 [ 1755.805510] ? process_one_work+0x3b0/0x3b0 [ >> 1755.805512] ? kthread_park+0x90/0x90 [ 1755.805517] >> ret_from_fork+0x35/0x40 [ 1755.805520] ---[ end trace >> 832a75ad96f8105f ]--- >> >> >> >> — Stéphane >> >> -----Message d'origine----- >> De : linux-can-owner@xxxxxxxxxxxxxxx >> <linux-can-owner@xxxxxxxxxxxxxxx> De la part de Oliver Hartkopp >> Envoyé : jeudi 16 juillet 2020 20:38 À : Philipp Lehmann >> <leph1016@xxxxxxxxxxxxxxx>; wg@xxxxxxxxxxxxxx; mkl@xxxxxxxxxxxxxx Cc >> : linux-can@xxxxxxxxxxxxxxx; christian.sauer.w@xxxxxxxxxxxxxxxx >> Objet : Re: [Bug] Kernel Panic on Deletion of the network-namespace >> containing the SocketCAN interface >> >> Hi Philipp, >> >> thanks for the report and its reproducer! >> >> I assumed the interfaces - at least in the case of 'real' hardware >> CAN interfaces - to me moved back to the root name space ... well. >> >> I'll take a look at it. >> >> Best regards, >> Oliver >> >> On 16.07.20 18:46, Philipp Lehmann wrote: >>> If a SocketCAN Interface (Tested with a PCAN-USB adapter) is moved >>> into a network-namespace and the network namespace is deleted >>> afterwards, without moving the device out of the namespace prior to >>> the deletion. The device could not be found in any of the network >>> namespaces afterwards, only a reboot of the system fixes this. If >>> the device is instead removed from the USB-Bus without a restart, a >>> kernel panic is the result. >>> >>> >>> Output of uname -r [Linux cpc4x 5.4.0-40-generic #44-Ubuntu SMP Tue >>> Jun 23 00:01:04 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux] >>> >>> >>> The bug could be reproduced with the following steps: >>> >>> >>> 1. Connect the (USB)-SocketCAN device to the host >>> >>> 2. Add a new network namespace [sudo ip netns add test] 3. Move the >>> CAN-interface to the network name-space [sudo ip link set dev can0 >>> netns test] 4. Delete the namespace [sudo ip netns delete test] 5. >>> Remove the adapter from the USB-Bus. In most cases this should >>> result in a kernel panic >>> >> >> -- >> PEAK-System Technik GmbH >> Sitz der Gesellschaft Darmstadt - HRB 9183 >> Geschaeftsfuehrung: Alexander Gach / Uwe Wilhelm Unsere >> Datenschutzerklaerung mit wichtigen Hinweisen zur Behandlung >> personenbezogener Daten finden Sie unter >> www.peak-system.com/Datenschutz.483.0.html >> -- PEAK-System Technik GmbH Sitz der Gesellschaft Darmstadt - HRB 9183 Geschaeftsfuehrung: Alexander Gach / Uwe Wilhelm Unsere Datenschutzerklaerung mit wichtigen Hinweisen zur Behandlung personenbezogener Daten finden Sie unter www.peak-system.com/Datenschutz.483.0.html