On 02/03/2014 05:00 AM, Mike Galbraith wrote: > On Sun, 2014-02-02 at 21:10 +0100, Sebastian Andrzej Siewior wrote: > >> According to the backtrace both of them are trying to access the >> per-cpu hrtimer (sched_timer) in order to cancel but they seem to fail >> to get the timer lock here. They shouldn't spin there for minutes, I >> have no idea why they did so… > > Hm. per-cpu... > > I've been chasing an rt hotplug heisenbug that is pointing to per-cpu > oddness. During sched domain re-construction while running Steven's > stress script on 64 core box, we hit a freshly constructed domain with > _no span_, build_sched_groups()->get_group() explodes when we meeting > it. But if you try to watch the thing appear... it just doesn't. > > static int build_sched_domains(const struct cpumask *cpu_map, > struct sched_domain_attr *attr) > { > enum s_alloc alloc_state; > struct sched_domain *sd; > struct s_data d; > int i, ret = -ENOMEM; > > alloc_state = __visit_domain_allocation_hell(&d, cpu_map); > if (alloc_state != sa_rootdomain) > goto error; > > /* Set up domains for cpus specified by the cpu_map. */ > for_each_cpu(i, cpu_map) { > struct sched_domain_topology_level *tl; > > sd = NULL; > for_each_sd_topology(tl) { > sd = build_sched_domain(tl, cpu_map, attr, sd, i); > BUG_ON(sd == spanless-alien) here.. spanless-alien is? BUG_ON() is actually _very_ cheap. It shouldn't even create any kind of compiler barrier which would reload variables / registers. It should evaluate sd and "spanless-alien", do the compare and then go on. > if (tl == sched_domain_topology) > *per_cpu_ptr(d.sd, i) = sd; > if (tl->flags & SDTL_OVERLAP || sched_feat(FORCE_SD_OVERLAP)) > sd->flags |= SD_OVERLAP; > if (cpumask_equal(cpu_map, sched_domain_span(sd))) > break; > } > } > > /* Build the groups for the domains */ > for_each_cpu(i, cpu_map) { > for (sd = *per_cpu_ptr(d.sd, i); sd; sd = sd->parent) { > sd->span_weight = cpumask_weight(sched_domain_span(sd)); > if (sd->flags & SD_OVERLAP) { > if (build_overlap_sched_groups(sd, i)) > goto error; > } else { > if (build_sched_groups(sd, i)) > ..prevents meeting that alien here.. while hotplug locked. my copy of build_sched_groups() always returns 0 so it never goes to the error marker. Did you consider a compiler bug? I could try to rebuild your source + config on two different compilers just to see if it makes a difference. Sebastian -- To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html