On Wed, 2011-09-21 at 20:50 +0200, Peter Zijlstra wrote: > On Wed, 2011-09-21 at 19:01 +0200, Peter Zijlstra wrote: > > On Wed, 2011-09-21 at 12:17 +0200, Mike Galbraith wrote: > > > [ 144.212272] ------------[ cut here ]------------ > > > [ 144.212280] WARNING: at kernel/sched.c:6152 migrate_disable+0x1b6/0x200() > > > [ 144.212282] Hardware name: MS-7502 > > > [ 144.212283] Modules linked in: snd_pcm_oss snd_mixer_oss snd_seq snd_seq_device edd nfsd lockd parport_pc parport nfs_acl auth_rpcgss sunrpc bridge ipv6 stp cpufreq_conservative microcode cpufreq_ondemand cpufreq_userspace cpufreq_powersave acpi_cpufreq mperf nls_iso8859_1 nls_cp437 vfat fat fuse ext3 jbd dm_mod usbmouse usb_storage usbhid snd_hda_codec_realtek usb_libusual uas sr_mod cdrom hid snd_hda_intel e1000e snd_hda_codec kvm_intel snd_hwdep sg snd_pcm kvm i2c_i801 snd_timer snd firewire_ohci firewire_core soundcore snd_page_alloc crc_itu_t button ext4 mbcache jbd2 crc16 uhci_hcd sd_mod ehci_hcd usbcore rtc_cmos ahci libahci libata scsi_mod fan processor thermal > > > [ 144.212317] Pid: 6215, comm: strace Not tainted 3.0.4-rt14 #2052 > > > [ 144.212319] Call Trace: > > > [ 144.212323] [<ffffffff8104662f>] warn_slowpath_common+0x7f/0xc0 > > > [ 144.212326] [<ffffffff8104668a>] warn_slowpath_null+0x1a/0x20 > > > [ 144.212328] [<ffffffff8103f606>] migrate_disable+0x1b6/0x200 > > > [ 144.212331] [<ffffffff8105a2a8>] ptrace_stop+0x128/0x240 > > > [ 144.212334] [<ffffffff81057b9b>] ? recalc_sigpending+0x1b/0x50 > > > [ 144.212337] [<ffffffff8105b6f1>] get_signal_to_deliver+0x211/0x530 > > > [ 144.212340] [<ffffffff81001835>] do_signal+0x75/0x7a0 > > > [ 144.212342] [<ffffffff8105ae68>] ? kill_pid_info+0x58/0x80 > > > [ 144.212344] [<ffffffff8105c34c>] ? sys_kill+0xac/0x1e0 > > > [ 144.212347] [<ffffffff81001fe5>] do_notify_resume+0x65/0x80 > > > [ 144.212350] [<ffffffff8135978b>] int_signal+0x12/0x17 > > > [ 144.212352] ---[ end trace 0000000000000002 ]--- > > > > > > Right, that's because of > > 53da1d9456fe7f87a920a78fdbdcf1225d197cb7, I think we simply want a full > > revert of that for -rt. > > This also made me stare at the trainwreck called wait_task_inactive(), > how about something like the below, it survives a boot and simple > strace. There's a missing hunklet, but... @@ -8325,9 +8290,7 @@ void __init sched_init(void) set_load_weight(&init_task); -#ifdef CONFIG_PREEMPT_NOTIFIERS INIT_HLIST_HEAD(&init_task.preempt_notifiers); -#endif #ifdef CONFIG_SMP open_softirq(SCHED_SOFTIRQ, run_rebalance_domains); ..perturbation (100% userspace hog) measurement proggy and jitter measurement proggy pinned to the same cpu makes 100% repeatable boom. Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 3 Pid: 6226, comm: pert Not tainted 3.0.4-rt14 #2053 Call Trace: <NMI> [<ffffffff81355f00>] panic+0xa0/0x1a8 [<ffffffff8108fe47>] watchdog_overflow_callback+0xe7/0xf0 [<ffffffff810c1c7c>] __perf_event_overflow+0x9c/0x250 [<ffffffff810c2734>] perf_event_overflow+0x14/0x20 [<ffffffff81014c7c>] intel_pmu_handle_irq+0x21c/0x440 [<ffffffff81010fb9>] perf_event_nmi_handler+0x39/0xc0 [<ffffffff8106f42c>] notifier_call_chain+0x4c/0x70 [<ffffffff8106fa6a>] __atomic_notifier_call_chain+0x4a/0x70 [<ffffffff8106faa6>] atomic_notifier_call_chain+0x16/0x20 [<ffffffff8106fc2e>] notify_die+0x2e/0x30 [<ffffffff81002c8a>] do_nmi+0xaa/0x240 [<ffffffff813592ea>] nmi+0x1a/0x20 <<EOE>> <0>Rebooting in 60 seconds..[ 0.000000] -- To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html