Re: [PATCH] sched: don't clear PF_THREAD_BOUND in select_fallback_rq

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, May 03, 2013 at 10:46:10PM +0200, Sebastian Andrzej Siewior wrote:
| * Qiang Huang | 2013-04-25 17:01:18 [+0800]:
| 
| >This is revert of "sched-clear-pf-thread-bound-on-fallback-rq.patch"
| >(commit 0d939066acdcb in v3.4-rt),.
| >
| >Select_fallback_rq() can be easilly called during system boot, because
| >select_task_rq_fair() just return task_cpu(p) for bounded kernel threads,
| >which is 0 during system boot and not in tsk_cpus_allowed, so
| >select_fallback_rq() is called and PF_THREAD_BOUND is cleared. In my
| >box, 1/3 bounded kernel threads will clear that flag after boot.
| >
| >And it will cause problems, for example:
| ># for pid in `ps -e -o pid`; do taskset -p -c 0-15 $pid; done
| >this command will cause system hung.
| >
| >What's more, I don't see why we need to clear this flag any more,
| >because "cpu/rt: Rework cpu down for PREEMPT_RT" already remove the
| >optimization for PF_THREAD_BOUND on migrate_disable/enable.
| >
| >Signed-off-by: Qiang Huang <h.huangqiang@xxxxxxxxxx>
| 
| I can execute the command you mendtion above on v3.4 and v3.8 with no
| hangs. Can you give me number of your cpus and maybe the config or
| another detail?

I was able to reproduce the original issue on 3.6-rt (PREEMPT_RT_FULL
enabled) running the ltp-cgroups testcase. in fact, as originally reported,
the issue appeared when running the cgroup_fj tests. It usually took from
8~11min to trigger the issue.

After applying the patch I was no longer able to reproduce the issue, even
on 16h-long test runs.

Luis

| I played a little with it on v3.8. That code you asked to remove
| triggers only on cpu down for kernel threads which do not use the
| park/unpark infrastructure that is "posixcputmr" and "migration" which
| get removed later. The only reason why "migration" pops up is so it can
| leave.
| I managed to trigger it as well for worker threads. The threads which
| were bound the CPU, that went down, are marked DISASSOCIATED in
| gcwq_unbind_fn() and we lose that PF_THREAD_BOUND flag once that thread
| is used. After the CPU gets back, it is assigned to the "old" cpu via
| worker_maybe_bind_and_lock() and the PF_THREAD_BOUND flag is missing.
| So that is not looking that good. Will look at this later.
| 
| Sebastian
| --
| To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
| the body of a message to majordomo@xxxxxxxxxxxxxxx
| More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [RT Stable]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]

  Powered by Linux