On Fri, Apr 29, 2011 at 1:06 AM, Sedat Dilek <sedat.dilek@xxxxxxxxxxxxxx> wrote: > On Fri, Apr 29, 2011 at 12:02 AM, Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote: >> On Thu, 28 Apr 2011, john stultz wrote: >>> On Thu, 2011-04-28 at 23:04 +0200, Thomas Gleixner wrote: >>> > /me suspects hrtimer changes to be the real culprit. >>> >>> I'm not seeing anything on right off, but it does smell like >>> e06383db9ec591696a06654257474b85bac1f8cb would be where such an issue >>> would crop up. >>> >>> Bruno, could you try checking out e06383db9ec, confirming it still >>> occurs (and then maybe seeing if it goes away at e06383db9ec^1)? >>> >>> I'll keep digging in the meantime. >> >> I found the bug already. The problem is that sched_init() calls >> init_rt_bandwidth() which calls hrtimer_init() _BEFORE_ >> hrtimers_init() is called. >> >> That was unnoticed so far as the CLOCK id to hrtimer base conversion >> was hardcoded. Now we use a table which is set up at hrtimers_init(), >> so the bandwith hrtimer ends up on CLOCK_REALTIME because the table is >> in the bss. >> >> The patch below fixes this, by providing the table statically rather >> than runtime initialized. Though that whole ordering wants to be >> revisited. >> >> Thanks, >> >>    Âtglx >> >> --- linux-2.6.orig/kernel/hrtimer.c >> +++ linux-2.6/kernel/hrtimer.c >> @@ -81,7 +81,11 @@ DEFINE_PER_CPU(struct hrtimer_cpu_base, >>    Â} >> Â}; >> >> -static int hrtimer_clock_to_base_table[MAX_CLOCKS]; >> +static int hrtimer_clock_to_base_table[MAX_CLOCKS] = { >> +    [CLOCK_REALTIME] = HRTIMER_BASE_REALTIME, >> +    [CLOCK_MONOTONIC] = HRTIMER_BASE_MONOTONIC, >> +    [CLOCK_BOOTTIME] = HRTIMER_BASE_BOOTTIME, >> +}; >> >> Âstatic inline int hrtimer_clockid_to_base(clockid_t clock_id) >> Â{ >> @@ -1722,10 +1726,6 @@ static struct notifier_block __cpuinitda >> >> Âvoid __init hrtimers_init(void) >> Â{ >> -    hrtimer_clock_to_base_table[CLOCK_REALTIME] = HRTIMER_BASE_REALTIME; >> -    hrtimer_clock_to_base_table[CLOCK_MONOTONIC] = HRTIMER_BASE_MONOTONIC; >> -    hrtimer_clock_to_base_table[CLOCK_BOOTTIME] = HRTIMER_BASE_BOOTTIME; >> - >>    Âhrtimer_cpu_notify(&hrtimers_nb, (unsigned long)CPU_UP_PREPARE, >>             Â(void *)(long)smp_processor_id()); >>    Âregister_cpu_notifier(&hrtimers_nb); >> >> >> > > Looks good so far, no stalls or call-traces. > > Really stressing with 20+ open tabs in firefox with flash-movie > running in one of them , tar-job, IRC-client etc. > I will run some more tests and collect data and send them later. > > - Sedat - > > P.S.: Patchset against linux-2.6-rcu.git#sedat.2011.04.23a where 0003 > is from [2] > > [1] http://git.us.kernel.org/?p=linux/kernel/git/paulmck/linux-2.6-rcu.git;a=shortlog;h=refs/heads/sedat.2011.04.23a > [2] https://patchwork.kernel.org/patch/739782/ > > $ l ../RCU-HOORAY/ > insgesamt 40 > drwxr-xr-x Â2 sd sd Â4096 29. Apr 01:02 . > drwxr-xr-x 35 sd sd 20480 29. Apr 01:01 .. > -rw-r--r-- Â1 sd sd  726 29. Apr 01:01 > 0001-Revert-rcu-restrict-TREE_RCU-to-SMP-builds-with-PREE.patch > -rw-r--r-- Â1 sd sd  735 29. Apr 01:01 > 0002-sched-Add-warning-when-RT-throttling-is-activated.patch > -rw-r--r-- Â1 sd sd Â2376 29. Apr 01:01 > 0003-2.6.39-rc4-Kernel-leaking-memory-during-FS-scanning-.patch > As promised the tarball (at the end of the log I made some XZ compressing). Wow! $ uptime 01:35:17 up 45 min, 3 users, load average: 0.45, 0.57, 1.27 Thanks to all involved people helping to kill that bug (Come on Paul, smile!). - Sedat -
Attachment:
from-dileks-4.tar.xz
Description: Binary data
Attachment:
from-dileks-4.tar.xz.sha256sum
Description: Binary data