On Fri, May 03, 2013 at 10:46:10PM +0200, Sebastian Andrzej Siewior wrote: | * Qiang Huang | 2013-04-25 17:01:18 [+0800]: | | >This is revert of "sched-clear-pf-thread-bound-on-fallback-rq.patch" | >(commit 0d939066acdcb in v3.4-rt),. | > | >Select_fallback_rq() can be easilly called during system boot, because | >select_task_rq_fair() just return task_cpu(p) for bounded kernel threads, | >which is 0 during system boot and not in tsk_cpus_allowed, so | >select_fallback_rq() is called and PF_THREAD_BOUND is cleared. In my | >box, 1/3 bounded kernel threads will clear that flag after boot. | > | >And it will cause problems, for example: | ># for pid in `ps -e -o pid`; do taskset -p -c 0-15 $pid; done | >this command will cause system hung. | > | >What's more, I don't see why we need to clear this flag any more, | >because "cpu/rt: Rework cpu down for PREEMPT_RT" already remove the | >optimization for PF_THREAD_BOUND on migrate_disable/enable. | > | >Signed-off-by: Qiang Huang <h.huangqiang@xxxxxxxxxx> | | I can execute the command you mendtion above on v3.4 and v3.8 with no | hangs. Can you give me number of your cpus and maybe the config or | another detail? I was able to reproduce the original issue on 3.6-rt (PREEMPT_RT_FULL enabled) running the ltp-cgroups testcase. in fact, as originally reported, the issue appeared when running the cgroup_fj tests. It usually took from 8~11min to trigger the issue. After applying the patch I was no longer able to reproduce the issue, even on 16h-long test runs. Luis | I played a little with it on v3.8. That code you asked to remove | triggers only on cpu down for kernel threads which do not use the | park/unpark infrastructure that is "posixcputmr" and "migration" which | get removed later. The only reason why "migration" pops up is so it can | leave. | I managed to trigger it as well for worker threads. The threads which | were bound the CPU, that went down, are marked DISASSOCIATED in | gcwq_unbind_fn() and we lose that PF_THREAD_BOUND flag once that thread | is used. After the CPU gets back, it is assigned to the "old" cpu via | worker_maybe_bind_and_lock() and the PF_THREAD_BOUND flag is missing. | So that is not looking that good. Will look at this later. | | Sebastian | -- | To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in | the body of a message to majordomo@xxxxxxxxxxxxxxx | More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html