On Thu, Apr 28, 2011 at 5:28 PM, Sedat Dilek <sedat.dilek@xxxxxxxxxxxxxx> wrote: > On Thu, Apr 28, 2011 at 3:30 PM, Mike Galbraith <efault@xxxxxx> wrote: >> On Thu, 2011-04-28 at 03:26 -0700, Paul E. McKenney wrote: >>> On Thu, Apr 28, 2011 at 11:45:03AM +0200, Sedat Dilek wrote: >>> > Hi, >>> > >>> > not sure if my problem from linux-2.6-rcu.git#sedat.2011.04.23a is >>> > related to the issue here. >>> > >>> > Just FYI: >>> > I am here on a Pentium-M (uniprocessor aka UP) and still unsure if I >>> > have the correct (optimal?) kernel-configs set. >>> > >>> > Paul gave me a script to collect RCU data and I enhanced it with >>> > collecting SCHED data. >>> > >>> > In the above mentionned GIT branch I applied these two extra commits >>> > (0001 requested by Paul and 0002 proposed by Thomas): >>> > >>> > patches/0001-Revert-rcu-restrict-TREE_RCU-to-SMP-builds-with-PREE.patch >>> > patches/0002-sched-Add-warning-when-RT-throttling-is-activated.patch >>> > >>> > Furthermore, I have added my kernel-config file, scripts, patches and >>> > logs (also output of 'cat /proc/cpuinfo'). >>> > >>> > Hope this helps the experts to narrow down the problem. >>> >>> Yow!!! >>> >>> Now this one might well be able to hit the 950 millisecond limit. >>> There are no fewer than 1,314,958 RCU callbacks queued up at the end of >>> the test. ÂAnd RCU has indeed noticed this and cranked up the number >>> of callbacks to be handled by each invocation of rcu_do_batch() to >>> 2,147,483,647. ÂAnd only 15 seconds earlier, there were zero callbacks >>> queued and the rcu_do_batch() limit was at the default of 10 callbacks >>> per invocation. >> >> Yeah, yow. ÂOnce the RT throttle hit, it stuck. >> >> Â.clock             : 1386824.201768 >> Â.rt_nr_running         : 2 >> Â.rt_throttled         Â: 1 >> Â.rt_time            : 950.132427 >> Â.rt_runtime          Â: 950.000000 >>      rcuc0   7     0.034118   10857  Â98     0.034118   Â1472.309646     0.000000 / >> FF  Â1   Â1 R  ÂR 0 [rcuc0] >> Â.clock             : 1402450.997994 >> Â.rt_nr_running         : 2 >> Â.rt_throttled         Â: 1 >> Â.rt_time            : 950.132427 >> Â.rt_runtime          Â: 950.000000 >>      rcuc0   7     0.034118   10857  Â98     0.034118   Â1472.309646     0.000000 / >> FF  Â1   Â1 R  ÂR 0 [rcuc0] >> >> ... >> >> Â.clock             : 2707432.862374 >> Â.rt_nr_running         : 2 >> Â.rt_throttled         Â: 1 >> Â.rt_time            : 950.132427 >> Â.rt_runtime          Â: 950.000000 >>      rcuc0   7     0.034118   10857  Â98     0.034118   Â1472.309646     0.000000 / >> FF  Â1   Â1 R  ÂR 0 [rcuc0] >> Â.clock             : 2722572.958381 >> Â.rt_nr_running         : 2 >> Â.rt_throttled         Â: 1 >> Â.rt_time            : 950.132427 >> Â.rt_runtime          Â: 950.000000 >>      rcuc0   7     0.034118   10857  Â98     0.034118   Â1472.309646     0.000000 / >> FF  Â1   Â1 R  ÂR 0 [rcuc0] >> >> > > Hi, > > OK, I tried with the patch proposed by Thomas (0003): > > patches/0001-Revert-rcu-restrict-TREE_RCU-to-SMP-builds-with-PREE.patch > patches/0002-sched-Add-warning-when-RT-throttling-is-activated.patch > patches/0003-sched-Remove-skip_clock_update-check.patch > > From the very beginning it looked as the system is "stable" due to: > > Â.rt_nr_running         : 0 > Â.rt_throttled         Â: 0 > > This changed when I started a simple tar-job to save my kernel > build-dir to an external USB-hdd. > From... > > Â.rt_nr_running         : 1 > Â.rt_throttled         Â: 1 > > ...To: > > Â.rt_nr_running         : 2 > Â.rt_throttled         Â: 1 > > Unfortunately, reducing all activities to a minimum load, did not > change from last known RT throttling state. > > Just noticed rt_time exceeds the value of 950 first time here: > > Â.rt_nr_running         : 1 > Â.rt_throttled         Â: 1 > Â.rt_time            : 950.005460 > > Full data attchached as tarball. > > - Sedat - > > P.S.: Excerpt from > collectdebugfs-v2_2.6.39-rc3-rcutree-sedat.2011.04.23a+.log (0:0 -> > 1:1 -> 2:1) > > -- > rt_rq[0]: > Â.rt_nr_running         : 0 > Â.rt_throttled         Â: 0 > Â.rt_time            : 888.893877 > Â.rt_runtime          Â: 950.000000 > > runnable tasks: >      Âtask  PID     tree-key Âswitches Âprio > exec-runtime     sum-exec    Âsum-sleep > ---------------------------------------------------------------------------------------------------------- > R      Âcat Â2652  Â115108.993460     1  120 > 115108.993460     1.147986     0.000000 / > -- > rt_rq[0]: > Â.rt_nr_running         : 1 > Â.rt_throttled         Â: 1 > Â.rt_time            : 950.005460 > Â.rt_runtime          Â: 950.000000 > > runnable tasks: >      Âtask  PID     tree-key Âswitches Âprio > exec-runtime     sum-exec    Âsum-sleep > ---------------------------------------------------------------------------------------------------------- >      rcuc0   7     0.000000   56869  Â98 > 0.000000    981.385605     0.000000 / > -- > rt_rq[0]: > Â.rt_nr_running         : 2 > Â.rt_throttled         Â: 1 > Â.rt_time            : 950.005460 > Â.rt_runtime          Â: 950.000000 > > runnable tasks: >      Âtask  PID     tree-key Âswitches Âprio > exec-runtime     sum-exec    Âsum-sleep > ---------------------------------------------------------------------------------------------------------- >      rcuc0   7     0.000000   56869  Â98 > 0.000000    981.385605     0.000000 / > -- > As an addendum: First call trace is seen after: [ 651.616057] sched: RT throttling activated [ 711.616033] INFO: rcu_sched_state detected stall on CPU 0 (t=15000 jiffies) - Sedat - ÿô.nÇ·ÿ±ég¬±¨Âaþé»®&Þ)î¦þ)íèh¨è&£ù¢¸ÿæ¢ú»þÇþm§ÿÿÃÿ)î¦þàbnö¥yÊ{^®wr«ë&§iÖ²('Ûÿÿìm éê¯Ãí¢ÿÚ·ÚýiÉ¢¸ÿý½§$þàÿ