On 25.10.2013 11:02, Linus Torvalds wrote:
Adding more people, so quoting the whole email for them.
We definitely have some module unload issues. Guys, try the following
a few times to unload modules:
lsmod | grep ' 0 '| cut -d' ' -f1 | xargs sudo rmmod
(a few times because unloading one module will then potentially make
I do use a quite monolithic kernel with only a few modules, and one of the machines is
pretty stripped down:
I was unable to trigger any unusual kernel reaction within 10000 rmmod / modprobe cycles.
lsmod
=====
Module Size Used by
ip6t_REJECT 12489 3
nf_conntrack_ipv6 13453 3
nf_defrag_ipv6 49936 1 nf_conntrack_ipv6
ip6table_raw 12565 1
ipt_REJECT 12485 3
xt_tcpudp 12531 6
xt_pkttype 12456 3
xt_LOG 17205 12
xt_limit 12570 12
iptable_raw 12561 1
xt_CT 12820 4
iptable_filter 12666 1
ip6table_mangle 12579 0
nf_conntrack_netbios_ns 12585 0
nf_conntrack_broadcast 12541 1 nf_conntrack_netbios_ns
nf_conntrack_ipv4 13655 3
nf_defrag_ipv4 12649 1 nf_conntrack_ipv4
ip_tables 17713 2 iptable_raw,iptable_filter
xt_conntrack 12664 6
nf_conntrack 67920 6 nf_conntrack_ipv6,xt_CT,nf_conntrack_netbios_ns,nf_conntrack_broadcast,nf_conntrack_ipv4,xt_conntrack
ip6table_filter 12670 1
ip6_tables 17740 3 ip6table_raw,ip6table_mangle,ip6table_filter
x_tables 21937 15 ip6t_REJECT,ip6table_raw,ipt_REJECT,xt_tcpudp,xt_pkttype,xt_LOG,xt_limit,iptable_raw,xt_CT,iptable_filter,ip6table_mangle,ip_tables,xt_conntrack,ip6table_filter,ip6_tables
snd_rme96 24387 0
snd_hda_intel 34073 0
snd_hda_codec_realtek 41826 1
snd_hda_codec 129150 2 snd_hda_intel,snd_hda_codec_realtek
snd_pcm 73096 3 snd_rme96,snd_hda_intel,snd_hda_codec
snd_timer 24441 1 snd_pcm
snd_page_alloc 14230 2 snd_hda_intel,snd_pcm
snd 58328 6 snd_rme96,snd_hda_intel,snd_hda_codec_realtek,snd_hda_codec,snd_pcm,snd_timer
soundcore 14599 1 snd
binfmt_misc 13111 1
ipv6 272895 24 ip6t_REJECT,nf_conntrack_ipv6,nf_defrag_ipv6,ip6table_mangle
other modules unloadable).
On my machine, I can trigger this, for example:
------------[ cut here ]------------
WARNING: CPU: 0 PID: 3217 at fs/sysfs/file.c:498 sysfs_attr_ns+0x91/0xa0()
sysfs: kobject (null) without dirent
Modules linked in: fuse nf_conntrack_broadcast ipt_MASQUERADE ip6t_REJECT xt_$
CPU: 0 PID: 3217 Comm: rmmod Not tainted 3.12.0-rc6-00284-ge6036c0b8896 #19
Hardware name: Sony Corporation SVP11213CXB/VAIO, BIOS R0270V7 05/17/2013
0000000000000009 ffff8800aca35df8 ffffffff8160aab5 ffff8800aca35e40
ffff8800aca35e30 ffffffff810514b8 ffffffffa013f080 ffff8801194a6040
0000000000000800 0000000000000000 0000000000c5b3e0 ffff8800aca35e90
Call Trace:
[<ffffffff8160aab5>] dump_stack+0x45/0x56
[<ffffffff810514b8>] warn_slowpath_common+0x78/0xa0
[<ffffffff81051527>] warn_slowpath_fmt+0x47/0x50
[<ffffffff810b5960>] ? module_refcount+0xb0/0xb0
[<ffffffff811e5c61>] sysfs_attr_ns+0x91/0xa0
[<ffffffff811e5d2a>] sysfs_remove_file+0x1a/0x50
[<ffffffff814c88a3>] cpufreq_sysfs_remove_file+0x13/0x30
[<ffffffffa013d350>] acpi_cpufreq_exit+0x2e/0xcde [acpi_cpufreq]
[<ffffffff810b7d1d>] SyS_delete_module+0x15d/0x2c0
[<ffffffff81002929>] ? do_notify_resume+0x59/0x90
[<ffffffff81618f62>] system_call_fastpath+0x16/0x1b
---[ end trace f887112caaa5c4ab ]---
so at least we have a cpufreq/sysfs interaction bug. There may be others.
This particular cpufreq issue may be triggered by the fact that
acpi-cpufreq isn't actually in use (pstate is). Or it might be some
generic cpufreq/sysfs bug. Rafael, Greg, ideas?
I don't see that this particular one would be the one that causes the
timer issues, but it's an example of the fact that module unload tends
to be special and not necessarily well tested.
Linus
On Fri, Oct 25, 2013 at 9:38 AM, Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
Hmm.. I just got a run_timer_softirq oops on my own laptop, slightly
different. That was not during shutdown, although there was a "yum
upgrade" finishing when that happened, so it's quite likely that there
was a service shutdown (and then restart).
I think it's related. But my oops has almost no information: the IP
that was jumped to was bogus, and the callchain is just CPU idle
followed by the softirq -> run_timers_softirq handling, so there's no
real way to see *what* triggered it.
The bad rip was ffffffffa051e250, which is not a valid code address.
It *might* be a module address, though. So this might be triggered by
rmmod on some module that doesn't remove all its timers...
Ideas?
Linus
--
To unsubscribe from this list: send the line "unsubscribe cpufreq" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html