Hi Steven, A patch follows the comment, could you take a look? On 2013/4/23 13:51, Li Zefan wrote: > On 2013/4/23 0:00, Steven Rostedt wrote: >> On Mon, 2013-04-22 at 17:39 +0800, Li Zefan wrote: >>> On 2013/4/19 15:30, Qiang Huang wrote: >>>> Hi, >>>> >>>> I ran cgroup_fj tests on RT kernel with PREEMPT_RT_FULL disabled, it will >>>> stick the system when ran cpuset stress tests, it happens everytime. >>>> >>>> Here stick the system means there are almost no response from the system and >>>> we can hardly do anything on the terminal, but kernel isn't crash nor deadlocked >>>> (according to the lockdep message), and it may do some response sometimes. >>>> >>>> The problem exists on all RT versions from 3.4.18-rt29 to 3.4.37-rt51 AFAIK, but >>>> without RT patches or with PREEMPT_RT_FULL enabled, the problem isn't exists. >>>> >>>> When the system is stuck, we will get the following message: >>>> # dmesg >>>> ... >>> >>> I've found the culprit after some investigation: >>> >>> From: Thomas Gleixner <tglx@xxxxxxxxxxxxx> >>> Date: Fri, 04 Nov 2011 19:48:36 +0000 >>> Subject: sched-clear-pf-thread-bound-on-fallback-rq.patch >>> >>> At system boot when some cpus haven't been up, the scheduler calls select_fallback_rq() >>> and schedules tasks in other cpus, which ends up clearing some kernel threads' >>> PF_THREAD_BOUND flag... >> >> I'm curious to why this doesn't break when PREEMPT_RT_FULL is enabled. I >> would think it would also cause issues there too. >> > > I was wrong in saying that PF_THREAD_BOUND is cleared because some cpus are not > online yet. It's because select_task_rq_fair() just returns prev_cpu, which is > task_cpu(p), which is 0 during system boot or some other cpu after boot, which > is not in tsk_cpus_allowed, so select_fallback_rq() is called and it clears > PF_THREAD_BOUND. > > I don't know why it didn't cause trouble when RT_FULL is enabled for Huang Qiang, I retested it, we do have the similar trouble when RT enabled, I might missed some config that avoid these warnings. And the patch below, I added your signed-off-by if it looks good to you. > but I did encoutner problems when testing in my box. > > I can trigger the bug with cgroup_fj.sh, or with taskset: > > # for pid in `ps -e -o pid`; do taskset -p -c 0-15 $pid; done > > But system hung or tasks hung may not happen right in the test, but will happen > after some random operations (try compile kernel). > > And while running test I saw lots of warnings like this: > > [ 146.702056] BUG: using smp_processor_id() in preemptible [00000000 00000000] code: kworker/ > 4:0/23 > [ 146.702069] caller is vmstat_update+0x22/0x60 > [ 146.702075] Pid: 23, comm: kworker/4:0 Not tainted 3.4.24.05+ #49 > [ 146.702077] Call Trace: > [ 146.702087] [<ffffffff8125f685>] debug_smp_processor_id+0x145/0x150 > [ 146.702091] [<ffffffff8113c872>] vmstat_update+0x22/0x60 > [ 146.702097] [<ffffffff81061033>] process_one_work+0x203/0x610 > [ 146.702101] [<ffffffff81060f70>] ? process_one_work+0x140/0x610 > [ 146.702105] [<ffffffff81061fdd>] ? worker_thread+0x6d/0x450 > [ 146.702109] [<ffffffff8113c850>] ? refresh_cpu_vm_stats+0x1d0/0x1d0 > [ 146.702114] [<ffffffff81062116>] worker_thread+0x1a6/0x450 > [ 146.702118] [<ffffffff81061f70>] ? manage_workers+0x250/0x250 > [ 146.702122] [<ffffffff810680f6>] kthread+0xb6/0xc0 > [ 146.702130] [<ffffffff81474ab4>] kernel_thread_helper+0x4/0x10 > [ 146.702137] [<ffffffff81076930>] ? finish_task_switch+0x90/0x100 > [ 146.702142] [<ffffffff8146bb34>] ? retint_restore_args+0x13/0x13 > [ 146.702145] [<ffffffff81068040>] ? kthreadd+0x310/0x310 > [ 146.702149] [<ffffffff81474ab0>] ? gs_change+0x13/0x13 > > and after a while those warnings stopped, instead warnings like this popped up, > even after I stopped the test: > > [ 252.896103] ------------[ cut here ]------------ > [ 252.896107] WARNING: at kernel/cpu.c:157 unpin_current_cpu+0x7d/0x90() > [ 252.896110] Hardware name: Tecal RH2285 > [ 252.896112] Modules linked in: iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi bridge > ipv6 stp llc cpufreq_conservative cpufreq_userspace cpufreq_powersave acpi_cpufreq mperf binfm > t_misc fuse loop dm_mod tpm_tis tpm coretemp crc32c_intel ghash_clmulni_intel aesni_intel sg s > erio_raw cryptd aes_x86_64 tpm_bios microcode i2c_i801 iTCO_wdt i2c_core bnx2 iTCO_vendor_supp > ort mptctl button usbhid hid uhci_hcd ehci_hcd usbcore usb_common sd_mod crc_t10dif edd ext3 m > bcache jbd fan processor ide_pci_generic ide_core ata_generic ata_piix libata mptsas mptscsih > mptbase scsi_transport_sas scsi_mod thermal thermal_sys hwmon > [ 252.896201] Pid: 9893, comm: dmesg Tainted: G W 3.4.24.05+ #49 > [ 252.896203] Call Trace: > [ 252.896208] [<ffffffff810404ed>] ? unpin_current_cpu+0x7d/0x90 > [ 252.896212] [<ffffffff810404ed>] ? unpin_current_cpu+0x7d/0x90 > [ 252.896217] [<ffffffff8103d83f>] warn_slowpath_common+0x7f/0xc0 > [ 252.896221] [<ffffffff8103d89a>] warn_slowpath_null+0x1a/0x20 > [ 252.896226] [<ffffffff810404ed>] unpin_current_cpu+0x7d/0x90 > [ 252.896231] [<ffffffff81078ddb>] migrate_enable+0xeb/0x1e0 > [ 252.896235] [<ffffffff81146b7b>] handle_pte_fault+0x34b/0x980 > [ 252.896240] [<ffffffff81076431>] ? get_parent_ip+0x11/0x50 > [ 252.896244] [<ffffffff81076431>] ? get_parent_ip+0x11/0x50 > [ 252.896250] [<ffffffff811472fc>] handle_mm_fault+0x14c/0x1e0 > [ 252.896254] [<ffffffff8146ef47>] do_page_fault+0x257/0x550 > [ 252.896260] [<ffffffff8114c995>] ? do_mmap_pgoff+0x375/0x3a0 > [ 252.896264] [<ffffffff8146bfb6>] ? error_sti+0x5/0x6 > [ 252.896269] [<ffffffff81259175>] ? trace_hardirqs_off_thunk+0x3a/0x3c > [ 252.896274] [<ffffffff8146bd75>] page_fault+0x25/0x30 > [ 252.896277] ---[ end trace 000000000000ae6e ]--- > > I didn't see those warnings if !RT_FULL. > > Here is the patch seems solve the problem, it looks all good in my box, my only concern is how will this affect our RT code. >From 8e4fa4e9a7b510bdaf90b8140ce1e847375abccf Mon Sep 17 00:00:00 2001 From: Qiang Huang <h.huangqiang@xxxxxxxxxx> Date: Thu, 25 Apr 2013 10:22:01 +0800 Subject: [PATCH] sched: don't clear PF_THREAD_BOUND in select_fallback_rq This is revert of "sched-clear-pf-thread-bound-on-fallback-rq.patch" (commit 0d939066acdcb in v3.4-rt),. Select_fallback_rq() can be easilly called during system boot, because select_task_rq_fair() just return task_cpu(p) for bounded kernel threads, which is 0 during system boot and not in tsk_cpus_allowed, so select_fallback_rq() is called and PF_THREAD_BOUND is cleared. In my box, 1/3 bounded kernel threads will clear that flag after boot. And it will cause problems, for example: # for pid in `ps -e -o pid`; do taskset -p -c 0-15 $pid; done this command will cause system hung. What's more, I don't see why we need to clear this flag any more, because "cpu/rt: Rework cpu down for PREEMPT_RT" already remove the optimization for PF_THREAD_BOUND on migrate_disable/enable. Signed-off-by: Qiang Huang <h.huangqiang@xxxxxxxxxx> Signed-off-by: Li Zefan <lizefan@xxxxxxxxxx> --- kernel/sched/core.c | 6 ------ 1 files changed, 0 insertions(+), 6 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 751ec60..8db6e3b 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -1327,12 +1327,6 @@ out: } } - /* - * Clear PF_THREAD_BOUND, otherwise we wreckage - * migrate_disable/enable. See optimization for - * PF_THREAD_BOUND tasks there. - */ - p->flags &= ~PF_THREAD_BOUND; return dest_cpu; } -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html