On Mon, Apr 13, 2009 at 6:00 PM, Sol Kavy <skavy@xxxxxxxxxx> wrote: > The following back trace represents a deadlock in Ubicom's SMP port of 2.6.28 kernel. I am sure that we are doing something unexpected. I would appreciate the community's help in understanding what is going wrong. > > Thanks in advance for any pointers, > > Sol Kavy > > Problem: > Ubicom's initial port does not use GENERIC_CLOCKEVENTS. Instead it uses a periodic timer based on HZ. The periodic timer calls do_timer() on each tick. > > From the arch directory perspective, we are required to hold the xtime_lock before calling do_timer(). The lock is indeed help by cpu 3 as evidenced in the output below. > > The call to get_jiffies_64() at the top of the backtrace is attempting to read the jiffies in a reliable fashion. The caller is required to wait for the xtime_lock not to be held. Clearly, since we are in a path that is holding the xtime_lock, this will never make forward progress. > > What is unclear to me is why other ports are not seeing the same problem? > > Perhaps it is because most ports now set GENERIC_CLOCKEVENTS which uses an entirely different mechanism for doing things. I am in the middle of switching the port to use GENERIC_CLOCKEVENTS but would like to understand this failure in more detail. > > Any feedback is greatly appreciated, > > Sol > > Config Flags: > # CONFIG_GENERIC_CLOCKEVENTS is not set > CONFIG_GENERIC_HARDIRQS=y > CONFIG_IRQ_PER_CPU=y > CONFIG_TRACE_IRQFLAGS_SUPPORT=y > # CONFIG_DEBUG_SHIRQ is not set > CONFIG_TRACE_IRQFLAGS=y > CONFIG_IRQSOFF_TRACER=y > > State of the lock: > (gdb) p xtime_lock > $5 = {sequence = 47089, lock = {raw_lock = {lock = 1}, magic = 3735899821, > owner_cpu = 3, owner = 0x42b01160}} > > This is a backtrace from CPU 3: > > (gdb) bt > #0 get_jiffies_64 () at include/linux/seqlock.h:94 > #1 0x4044f558 in sched_clock () at kernel/sched_clock.c:40 > #2 0x4045f108 in ring_buffer_time_stamp (cpu=3) at kernel/trace/ring_buffer.c:58 > #3 0x40464c50 in ftrace_now (cpu=3) at kernel/trace/trace.c:77 > #4 0x404656ec in trace_hardirqs_off () at kernel/trace/trace_irqsoff.c:207 > #5 0x40413020 in _spin_lock_irqsave (lock=0x3) at kernel/spinlock.c:82 > #6 0x40451b2c in clocksource_get_next () at kernel/time/clocksource.c:254 > #7 0x3ffd08ac in update_wall_time () at kernel/time/timekeeping.c:182 > #8 0x4043bcd8 in do_timer (ticks=0) at kernel/timer.c:1125 > #9 0x404169f8 in timer_tick (irq=<value optimized out>, > dummy=<value optimized out>) at arch/ubicom32/kernel/time.c:126 > #10 0x3ffcefb0 in handle_IRQ_event (irq=<value optimized out>, > action=0x406fe6f4) at kernel/irq/handle.c:142 > #11 0x3ffcee20 in __do_IRQ (irq=<value optimized out>) > at kernel/irq/handle.c:239 > #12 0x3ffcfcc8 in do_IRQ (irq=47089, regs=<value optimized out>) > at arch/ubicom32/kernel/irq.c:250 > #13 0x4041018c in sys_call_table () > #14 0x4044f558 in sched_clock () at kernel/sched_clock.c:40 > #15 0x00000008 in ?? () > > > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-newbie" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.linux-learn.org/faqs > Hi Sol, Caveat: I'm not familiar with porting Linux :) get_jiffies_64 does attempt to acquire xtime_lock, but since you know the lock is already held in do_timer, can you simply read the value directly? You'll just have to make sure every code path through sched_clock() has xtime_lock... Here is an example from the lxr that might be relevant: http://lxr.linux.no/linux+v2.6.29/arch/frv/kernel/time.c#L151 Gedare Bloom -- To unsubscribe from this list: send the line "unsubscribe linux-newbie" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.linux-learn.org/faqs