On Wed, 2012-10-03 at 23:34 -0500, Sankara Muthukrishnan wrote: > On Wed, Oct 3, 2012 at 6:08 PM, <vomlehn@xxxxxxxxx> wrote: > > > > If I am interpreting the ftrace output correctly, I seem to be having a > > problem with migrating ksoftirqd threads. I've attached a trace below > > but the following line illustrates the issue: > > > > # TASK-PID CPU# TIMESTAMP FUNCTION > > # | | | | | > > ... > > ksoftirqd/1-15 [000] 413.997630: dmvl_debug0 <-run_ksoftirqd > > > > I would expect ksoftirqd/1 to remain on CPU 1. If migration is actually > > happening, I think explains the strange msleep_interruptible() behavior > > I'm seeing--ksoftirqd only runs the timer queue for the processor on which > > it's running, and no ksoftirqd is running on the processor on which the > > timer_list entry was added. > > > > [Note: dmvl_debug0() is a dummy function I inserted at the top of the loop > > in run_ksoftirqd so I could verify with ftrace that it actually got there] > > > > --------------------------------- cut here ------------------------------------ > > # tracer: function > > # > > # TASK-PID CPU# TIMESTAMP FUNCTION > > # | | | | | > > ksoftirqd/1-15 [001] 413.985590: ksoftirqd_do_softirq <-run_ksoftirqd > > ksoftirqd/0-3 [000] 413.985596: ksoftirqd_do_softirq <-run_ksoftirqd > > ksoftirqd/0-3 [000] 413.985603: dmvl_debug0 <-run_ksoftirqd > > ksoftirqd/1-15 [001] 413.985613: dmvl_debug0 <-run_ksoftirqd > > ksoftirqd/1-15 [001] 413.986585: ksoftirqd_do_softirq <-run_ksoftirqd > > ksoftirqd/1-15 [001] 413.986595: dmvl_debug0 <-run_ksoftirqd > > ksoftirqd/0-3 [000] 413.986600: ksoftirqd_do_softirq <-run_ksoftirqd > > ksoftirqd/0-3 [000] 413.986613: dmvl_debug0 <-run_ksoftirqd > > ksoftirqd/1-15 [001] 413.987581: ksoftirqd_do_softirq <-run_ksoftirqd > > ksoftirqd/0-3 [000] 413.987595: ksoftirqd_do_softirq <-run_ksoftirqd > > ksoftirqd/1-15 [001] 413.987596: dmvl_debug0 <-run_ksoftirqd > > ksoftirqd/0-3 [000] 413.987607: dmvl_debug0 <-run_ksoftirqd > > ksoftirqd/1-15 [001] 413.988581: ksoftirqd_do_softirq <-run_ksoftirqd > > ksoftirqd/1-15 [001] 413.988594: dmvl_debug0 <-run_ksoftirqd > > ksoftirqd/0-3 [000] 413.988596: ksoftirqd_do_softirq <-run_ksoftirqd > > ksoftirqd/0-3 [000] 413.988605: dmvl_debug0 <-run_ksoftirqd > > ksoftirqd/1-15 [001] 413.989581: ksoftirqd_do_softirq <-run_ksoftirqd > > ksoftirqd/0-3 [000] 413.989597: ksoftirqd_do_softirq <-run_ksoftirqd > > ksoftirqd/1-15 [001] 413.989601: dmvl_debug0 <-run_ksoftirqd > > ksoftirqd/0-3 [000] 413.989603: dmvl_debug0 <-run_ksoftirqd > > ksoftirqd/1-15 [001] 413.990581: ksoftirqd_do_softirq <-run_ksoftirqd > > ksoftirqd/1-15 [001] 413.990589: dmvl_debug0 <-run_ksoftirqd > > ksoftirqd/0-3 [000] 413.990594: ksoftirqd_do_softirq <-run_ksoftirqd > > ksoftirqd/0-3 [000] 413.990607: dmvl_debug0 <-run_ksoftirqd > > rcuc/0-8 [000] 413.990630: __do_softirq <-local_bh_enable > > rcuc/0-8 [000] 413.990642: __do_softirq <-local_bh_enable > > ksoftirqd/0-3 [000] 413.990650: dmvl_debug0 <-run_ksoftirqd > > ksoftirqd/1-15 [001] 413.991580: ksoftirqd_do_softirq <-run_ksoftirqd > > ksoftirqd/1-15 [001] 413.991586: dmvl_debug0 <-run_ksoftirqd > > ksoftirqd/0-3 [000] 413.991592: ksoftirqd_do_softirq <-run_ksoftirqd > > ksoftirqd/0-3 [000] 413.991598: dmvl_debug0 <-run_ksoftirqd > > ksoftirqd/1-15 [001] 413.992579: ksoftirqd_do_softirq <-run_ksoftirqd > > ksoftirqd/0-3 [000] 413.992593: ksoftirqd_do_softirq <-run_ksoftirqd > > ksoftirqd/1-15 [001] 413.992597: dmvl_debug0 <-run_ksoftirqd > > ksoftirqd/0-3 [000] 413.992599: dmvl_debug0 <-run_ksoftirqd > > ksoftirqd/1-15 [001] 413.993579: ksoftirqd_do_softirq <-run_ksoftirqd > > ksoftirqd/1-15 [001] 413.993586: dmvl_debug0 <-run_ksoftirqd > > ksoftirqd/0-3 [000] 413.993591: ksoftirqd_do_softirq <-run_ksoftirqd > > ksoftirqd/0-3 [000] 413.993604: dmvl_debug0 <-run_ksoftirqd > > rcuc/0-8 [000] 413.993623: __do_softirq <-local_bh_enable > > rcuc/0-8 [000] 413.993635: __do_softirq <-local_bh_enable > > rcuc/0-8 [000] 413.993644: __do_softirq <-local_bh_enable > > ksoftirqd/0-3 [000] 413.993652: dmvl_debug0 <-run_ksoftirqd > > ksoftirqd/0-3 [000] 413.994584: ksoftirqd_do_softirq <-run_ksoftirqd > > ksoftirqd/0-3 [000] 413.994590: dmvl_debug0 <-run_ksoftirqd > > ksoftirqd/1-15 [001] 413.994593: ksoftirqd_do_softirq <-run_ksoftirqd > > ksoftirqd/1-15 [001] 413.994603: dmvl_debug0 <-run_ksoftirqd > > ksoftirqd/0-3 [000] 413.995583: ksoftirqd_do_softirq <-run_ksoftirqd > > ksoftirqd/0-3 [000] 413.995589: dmvl_debug0 <-run_ksoftirqd > > ksoftirqd/1-15 [001] 413.995590: ksoftirqd_do_softirq <-run_ksoftirqd > > ksoftirqd/1-15 [001] 413.995600: dmvl_debug0 <-run_ksoftirqd > > ksoftirqd/0-3 [000] 413.996582: ksoftirqd_do_softirq <-run_ksoftirqd > > ksoftirqd/1-15 [001] 413.996590: ksoftirqd_do_softirq <-run_ksoftirqd > > ksoftirqd/0-3 [000] 413.996592: dmvl_debug0 <-run_ksoftirqd > > ksoftirqd/1-15 [001] 413.996598: dmvl_debug0 <-run_ksoftirqd > > rcuc/0-8 [000] 413.996616: __do_softirq <-local_bh_enable > > rcuc/0-8 [000] 413.996629: __do_softirq <-local_bh_enable > > rcuc/0-8 [000] 413.996639: __do_softirq <-local_bh_enable > > rcuc/0-8 [000] 413.996648: __do_softirq <-local_bh_enable > > rcuc/0-8 [000] 413.996658: __do_softirq <-local_bh_enable > > rcuc/0-8 [000] 413.996667: __do_softirq <-local_bh_enable > > ksoftirqd/0-3 [000] 413.996675: dmvl_debug0 <-run_ksoftirqd > > ksoftirqd/0-3 [000] 413.997584: ksoftirqd_do_softirq <-run_ksoftirqd > > ksoftirqd/1-15 [001] 413.997592: ksoftirqd_do_softirq <-run_ksoftirqd > > ksoftirqd/0-3 [000] 413.997601: dmvl_debug0 <-run_ksoftirqd > > ksoftirqd/1-15 [001] 413.997603: dmvl_debug0 <-run_ksoftirqd > > rcuc/1-14 [001] 413.997627: __do_softirq <-local_bh_enable > > ksoftirqd/1-15 [000] 413.997630: dmvl_debug0 <-run_ksoftirqd > > rcuc/1-14 [001] 413.997643: __do_softirq <-local_bh_enable > > ksoftirqd/1-15 [000] 413.997644: dmvl_debug0 <-run_ksoftirqd > > rcuc/1-14 [001] 413.997655: __do_softirq <-local_bh_enable > > ksoftirqd/1-15 [000] 413.997657: dmvl_debug0 <-run_ksoftirqd > > rcuc/1-14 [001] 413.997668: __do_softirq <-local_bh_enable > > ksoftirqd/1-15 [000] 413.997670: dmvl_debug0 <-run_ksoftirqd > > rcuc/1-14 [001] 413.997681: __do_softirq <-local_bh_enable > > ksoftirqd/1-15 [000] 413.997682: dmvl_debug0 <-run_ksoftirqd > > ksoftirqd/0-3 [000] 413.998596: ksoftirqd_do_softirq <-run_ksoftirqd > > ksoftirqd/0-3 [000] 413.998607: dmvl_debug0 <-run_ksoftirqd > > ksoftirqd/1-15 [001] 413.998608: ksoftirqd_do_softirq <-run_ksoftirqd > > ksoftirqd/1-15 [001] 413.998616: dmvl_debug0 <-run_ksoftirqd > > ksoftirqd/0-3 [000] 413.999583: ksoftirqd_do_softirq <-run_ksoftirqd > > ksoftirqd/1-15 [001] 413.999590: ksoftirqd_do_softirq <-run_ksoftirqd > > > > > > > > > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in > > the body of a message to majordomo@xxxxxxxxxxxxxxx > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > Would you mind posting the kernel config file (that might help others > to look at appropriate source code paths)? I see that the per-cpu > "rcuc" thread is running on core-1 between the two time-stamps > 413.997603 and 413.998608 (which are the time-stamps before and after > when ksoftirqd/1 ran on core-1 correctly and ran on core-0 between > which). "rcuc" is related to the RCU boost feature. I am wondering > whether it might not be a bad idea to disable CONFIG_RCU_BOOST and see > whether the problem still reproduces (not sure whether it will prove > anything definitely but might give a useful data point?). > > Do you use cgroup cpusets? I don't think it is related to that though. perf record -a -e sched:sched_migrate_task do stuff perf script|grep comm=ksoft The above should then agree that it's really really migrating. If taskset -p and /proc/pid/sched::rt.nr_cpus_allowed agree it's pinned, it's supposed to be impossible to push/pull/wake whatever migrate the thing, and allegedly all paths check those before trying. (never heard of softirq threads wandering about, ever) -Mike -- To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html