On Mon, Feb 16, 2009 at 09:02:18PM +0100, Frederic Weisbecker wrote: > On Mon, Feb 16, 2009 at 10:56:16AM -0800, Paul E. McKenney wrote: > > On Mon, Feb 16, 2009 at 08:06:13AM -0800, Paul E. McKenney wrote: > > > On Mon, Feb 16, 2009 at 02:21:51PM +0100, Ingo Molnar wrote: > > > > > > > > * Damien Wyart <damien.wyart@xxxxxxx> wrote: > > > > > > > > > * Ingo Molnar <mingo@xxxxxxx> [090216 13:26]: > > > > > > We do get 0x100 which is 1 << RCU_SOFTIRQ, i.e. the RCU softirq. Paul, > > > > > > this indeed seems to be a CONFIG_TREE_RCU=y bug. > > > > > > > > > > > What is weird is that RCU_SOFTIRQ gets set again and again - but there's > > > > > > no raise_softirq() calls. Could you please do a two-CPU trace too via: > > > > > > > > > > > echo 3 > /debug/tracing/tracing_cpumask > > > > > > > > > > > So that we can see what's happening on the other CPU? > > > > > > > > > > > Also, could you please apply the debug patch below (or update to the > > > > > > very latest -tip tree), so that we get trace entries of softirq triggers > > > > > > too? > > > > > > > > > > Ok, the new trace with these additional modifications is here: > > > > > http://damien.wyart.free.fr/ksoftirqd_pb/trace_tip_2009.02.16_1300_ksoftirqd_pb_abstime_proc_mask3.txt.gz > > > > > > > > thanks. > > > > > > > > This confirms that SOFTIRQ_RCU gets raised here in the timer IRQ: > > > > > > > > 136.255963 | 0) sleep-2345 | | update_process_times() { > > > > 136.255964 | 0) sleep-2345 | | account_process_tick() { > > > > 136.255965 | 0) sleep-2345 | 0.779 us | account_system_time(); > > > > 136.255966 | 0) sleep-2345 | 2.262 us | } > > > > 136.255967 | 0) sleep-2345 | | run_local_timers() { > > > > 136.255968 | 0) sleep-2345 | 0.802 us | hrtimer_run_queues(); > > > > 136.255969 | 0) sleep-2345 | | raise_softirq() { > > > > 136.255970 | 0) sleep-2345 | | raise_softirq_irqoff() { > > > > 136.255971 | 0) sleep-2345 | | __raise_softirq_irqoff() { > > > > 136.255972 | 0) sleep-2345 | | /* nr: 1 */ > > > > 136.255973 | 0) sleep-2345 | 2.194 us | } > > > > 136.255974 | 0) sleep-2345 | 3.832 us | } > > > > 136.255975 | 0) sleep-2345 | 5.491 us | } > > > > 136.255976 | 0) sleep-2345 | 8.667 us | } > > > > 136.255976 | 0) sleep-2345 | 0.792 us | rcu_pending(); > > > > 136.255978 | 0) sleep-2345 | | rcu_check_callbacks() { > > > > 136.255979 | 0) sleep-2345 | 0.781 us | idle_cpu(); > > > > 136.255981 | 0) sleep-2345 | | raise_softirq() { > > > > 136.255981 | 0) sleep-2345 | | raise_softirq_irqoff() { > > > > 136.255982 | 0) sleep-2345 | | __raise_softirq_irqoff() { > > > > 136.255983 | 0) sleep-2345 | | /* nr: 8 */ > > > > 136.255984 | 0) sleep-2345 | 1.555 us | } > > > > 136.255984 | 0) sleep-2345 | 3.059 us | } > > > > 136.255985 | 0) sleep-2345 | 4.594 us | } > > > > 136.255986 | 0) sleep-2345 | 7.800 us | } > > > > 136.255987 | 0) sleep-2345 | 0.737 us | printk_tick(); > > > > > > > > again and again. > > > > > > Interesting... > > > > > > I will take a look! > > > > The above sequence is more or less normal behavior -- the RCU softirq > > handler rcu_process_callbacks() is being invoked once per tick, which > > appears to be HZ=1000 or thereabouts. The system appears to be mostly > > idle during this time period. > > > > One oddity is that the _bh call to __rcu_process_callbacks() is invoking > > force_quiescent_state() each time, and force_quiescent_state() isn't > > doing anything. This is a possible mismatch between the conditions in > > rcu_pending() and force_quiescent_state(), and I will look into this. > > > > However, this sequence is consuming less than 10 microseconds per > > millisecond, so cannot be the main cause of the softirq issues you > > are seeing, though if there really is a mismatch, it needs to be fixed, > > and I will attend to this. > > > > The interesting portion of the trace is later on: > > > > 137.896992 | 1) ksoftir-2302 | | do_softirq() { > > 137.896993 | 1) ksoftir-2302 | | __do_softirq() { > > 137.896993 | 1) ksoftir-2302 | | /* #1 softirq pending: 00000100 */ > > 137.896994 | 1) ksoftir-2302 | | /* #2 softirq pending: 00000000 */ > > 137.896995 | 1) ksoftir-2302 | | rcu_process_callbacks() { > > 137.896995 | 1) ksoftir-2302 | | __rcu_process_callbacks() { > > 137.896996 | 1) ksoftir-2302 | 0.498 us | force_quiescent_state(); > > 137.896997 | 1) ksoftir-2302 | 1.588 us | } > > 137.896997 | 1) ksoftir-2302 | | __rcu_process_callbacks() { > > 137.896998 | 1) ksoftir-2302 | 0.475 us | force_quiescent_state(); > > 137.896999 | 1) ksoftir-2302 | | cpu_quiet() { > > 137.896999 | 1) ksoftir-2302 | 0.526 us | _spin_lock_irqsave(); > > 137.897000 | 1) ksoftir-2302 | 0.511 us | _spin_unlock_irqrestore(); > > 137.897001 | 1) ksoftir-2302 | 2.528 us | } > > 137.897002 | 1) ksoftir-2302 | 4.607 us | } > > 137.897002 | 1) ksoftir-2302 | 7.825 us | } > > 137.897003 | 1) ksoftir-2302 | 0.498 us | _local_bh_enable(); > > 137.897004 | 1) ksoftir-2302 | + 11.430 us | } > > 137.897005 | 1) ksoftir-2302 | + 12.572 us | } > > 137.897005 | 1) ksoftir-2302 | 0.549 us | _cond_resched(); > > 137.897006 | 1) ksoftir-2302 | 0.541 us | kthread_should_stop(); > > 137.897007 | 1) ksoftir-2302 | | schedule() { > > 137.897008 | 1) ksoftir-2302 | | __schedule() { > > 137.897008 | 1) ksoftir-2302 | 0.514 us | _spin_lock_irq(); > > 137.897009 | 1) ksoftir-2302 | 0.594 us | update_rq_clock(); > > 137.897011 | 1) ksoftir-2302 | | deactivate_task() { > > 137.897011 | 1) ksoftir-2302 | | dequeue_task() { > > 137.897011 | 1) ksoftir-2302 | | dequeue_task_fair() { > > 137.897012 | 1) ksoftir-2302 | | update_curr() { > > 137.897012 | 1) ksoftir-2302 | | calc_delta_fair() { > > 137.897013 | 1) ksoftir-2302 | 0.506 us | calc_delta_mine(); > > 137.897014 | 1) ksoftir-2302 | 1.528 us | } > > 137.897015 | 1) ksoftir-2302 | 2.563 us | } > > 137.897015 | 1) ksoftir-2302 | 0.513 us | hrtick_start_fair(); > > 137.897019 | 1) ksoftir-2302 | 4.662 us | } > > 137.897019 | 1) ksoftir-2302 | 8.213 us | } > > 137.897020 | 1) ksoftir-2302 | 9.195 us | } > > 137.897020 | 1) ksoftir-2302 | 0.960 us | find_busiest_group(); > > 137.897022 | 1) ksoftir-2302 | 0.493 us | msecs_to_jiffies(); > > 137.897023 | 1) ksoftir-2302 | 0.511 us | put_prev_task_fair(); > > 137.897024 | 1) ksoftir-2302 | | pick_next_task() { > > 137.897024 | 1) ksoftir-2302 | 0.481 us | pick_next_task_fair(); > > 137.897025 | 1) ksoftir-2302 | 0.491 us | pick_next_task_rt(); > > 137.897026 | 1) ksoftir-2302 | 0.474 us | pick_next_task_fair(); > > 137.897027 | 1) ksoftir-2302 | 0.480 us | pick_next_task_idle(); > > 137.897028 | 1) ksoftir-2302 | 4.516 us | } > > 137.897029 | 1) ksoftir-2302 | | perf_counter_task_sched_out() { > > 137.897029 | 1) ksoftir-2302 | | __perf_counter_sched_out() { > > 137.897030 | 1) ksoftir-2302 | 0.516 us | _spin_lock(); > > 137.897031 | 1) ksoftir-2302 | 1.486 us | } > > 137.897031 | 1) ksoftir-2302 | 2.462 us | } > > 137.897032 | 1) ksoftir-2302 | 0.516 us | __lock_text_start(); > > 137.897045 | ------------------------------------------ > > 1) ksoftir-2302 => <idle>-0 > > ------------------------------------------ > > > > 1) <idle>-0 | | /* nr: 8 */ > > ------------------------------------------ > > 1) <idle>-0 => ksoftir-2302 > > ------------------------------------------ > > > > 137.897064 | 1) ksoftir-2302 | | finish_task_switch() { > > 137.897064 | 1) ksoftir-2302 | | perf_counter_task_sched_in() { > > 137.897065 | 1) ksoftir-2302 | 0.508 us | _spin_lock(); > > 137.897066 | 1) ksoftir-2302 | 1.525 us | } > > 137.897066 | 1) ksoftir-2302 | 2.617 us | } > > 137.897067 | 1) ksoftir-2302 | + 58.928 us | } > > 137.897067 | 1) ksoftir-2302 | + 59.926 us | } > > 137.897068 | 1) ksoftir-2302 | | do_softirq() { > > 137.897068 | 1) ksoftir-2302 | | __do_softirq() { > > 137.897069 | 1) ksoftir-2302 | | /* #1 softirq pending: 00000100 */ > > 137.897070 | 1) ksoftir-2302 | | /* #2 softirq pending: 00000000 */ > > 137.897070 | 1) ksoftir-2302 | | rcu_process_callbacks() { > > 137.897071 | 1) ksoftir-2302 | | __rcu_process_callbacks() { > > 137.897071 | 1) ksoftir-2302 | | force_quiescent_state() { > > 137.897073 | 1) ksoftir-2302 | 1.575 us | } > > 137.897073 | 1) ksoftir-2302 | | __rcu_process_callbacks() { > > 137.897074 | 1) ksoftir-2302 | 0.474 us | force_quiescent_state(); > > 137.897075 | 1) ksoftir-2302 | | cpu_quiet() { > > 137.897075 | 1) ksoftir-2302 | 0.526 us | _spin_lock_irqsave(); > > 137.897076 | 1) ksoftir-2302 | 0.511 us | _spin_unlock_irqrestore(); > > 137.897077 | 1) ksoftir-2302 | 2.532 us | } > > 137.897078 | 1) ksoftir-2302 | 4.632 us | } > > 137.897078 | 1) ksoftir-2302 | 7.815 us | } > > 137.897079 | 1) ksoftir-2302 | 0.501 us | _local_bh_enable(); > > 137.897080 | 1) ksoftir-2302 | + 11.405 us | } > > 137.897080 | 1) ksoftir-2302 | + 12.542 us | } > > > > Here the calls to rcu_process_callbacks() are only 75 microseconds apart, > > so that this function is consuming more than 10% of a CPU. The strange > > thing is that I don't see a raise_softirq() in between, though perhaps > > it gets inlined or something that makes it invisible to ftrace. > > Certainly rcu_process_callbacks() can re-invoke itself, for example, > > when a large number of RCU callbacks has piled up. However, there are > > only 29 calls to __call_rcu() over the entire time period, so that does > > not appear to be the cause. Strangely enough, there appear to be no > > calls to rcu_do_batch() over the full trace, but this is invoked > > unconditionally from __rcu_process_callbacks(). So perhaps the trace > > wasn't covering that function? > > I just checked an assembly dump of my vmlinux, and rcu_do_batch() > has been inlined. I don't understand why, this is a wide function. Yow!!! ;-) > > Whatever, this pattern continues for more than 300 milliseconds(!). > > > > Would you be willing to enable CONFIG_RCU_TRACE and CONFIG_TREE_RCU, > > reproduce this and send the output of the debugfs files rcu/rcudata > > and rcu/rcuhier? The commands for this would be: > > > > mkdir /debug || : > > mount -t debugfs debugfs /debug > > cat /debug/rcu/rcuhier > > cat /debug/rcu/rcudata > > > > I will try to reproduce locally as well. No luck thus far, perhaps because I first tried x86_64 and Damien's run was on 32-bit x86. Thanx, Paul -- To unsubscribe from this list: send the line "unsubscribe kernel-testers" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html