RE: IRQ Tracing Problem in Linux 2.6.28 Kernel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Every arch's timer interrupt does the following:

	write_seqlock(&xtime_lock);
	do_timer(1);
	write_sequnlock(&xtime_lock);

This is required because the first thing do_timer() does is increment jiffies_64.

	void do_timer(unsigned long ticks)
	{
		jiffies_64 += ticks;
		update_times(ticks);
	}

The frames 8-0 in the backtrace below are in Linux main code (which as a port, we don't want to change).   The arch calls do_timer() to advance time (and must do so with the lock held).  However, if the Linux code is then going to eventually call get_jiffies_64(), this leads to a deadlock.  The only time that get_jiffies_64() is called is when you have selected CONFIG_IRQSOFF_TRACER and CONFIG_TRACE_IRQFLAGS.   

My next step is to build this on a x86 and try to understand why that port does not run into the same problem.

Sol


-----Original Message-----
From: Gedare Bloom [mailto:gedare@xxxxxxxxxxxxxx] 
Sent: Tuesday, April 14, 2009 5:52 AM
To: Sol Kavy
Cc: linux-newbie@xxxxxxxxxxxxxxx
Subject: Re: IRQ Tracing Problem in Linux 2.6.28 Kernel

On Mon, Apr 13, 2009 at 6:00 PM, Sol Kavy <skavy@xxxxxxxxxx> wrote:
> The following back trace represents a deadlock in Ubicom's SMP port of 2.6.28 kernel.   I am sure that we are doing something unexpected.  I would appreciate the community's help in understanding what is going wrong.
>
> Thanks in advance for any pointers,
>
> Sol Kavy
>
> Problem:
> Ubicom's initial port does not use GENERIC_CLOCKEVENTS.  Instead it uses a periodic timer based on HZ.  The periodic timer calls do_timer() on each tick.
>
> From the arch directory perspective, we are required to hold the xtime_lock before calling do_timer().  The lock is indeed help by cpu 3 as evidenced in the output below.
>
> The call to get_jiffies_64() at the top of the backtrace is attempting to read the jiffies in a reliable fashion.  The caller is required to wait for the xtime_lock not to be held.  Clearly, since we are in  a path that is holding the xtime_lock, this will never make forward progress.
>
> What is unclear to me is why other ports are not seeing the same problem?
>
> Perhaps it is because most ports now set GENERIC_CLOCKEVENTS which uses an entirely different mechanism for doing things.  I am in the middle of switching the port to use GENERIC_CLOCKEVENTS but would like to understand this failure in more detail.
>
> Any feedback is greatly appreciated,
>
> Sol
>
> Config Flags:
> # CONFIG_GENERIC_CLOCKEVENTS is not set
> CONFIG_GENERIC_HARDIRQS=y
> CONFIG_IRQ_PER_CPU=y
> CONFIG_TRACE_IRQFLAGS_SUPPORT=y
> # CONFIG_DEBUG_SHIRQ is not set
> CONFIG_TRACE_IRQFLAGS=y
> CONFIG_IRQSOFF_TRACER=y
>
> State of the lock:
> (gdb) p xtime_lock
> $5 = {sequence = 47089, lock = {raw_lock = {lock = 1}, magic = 3735899821,
>    owner_cpu = 3, owner = 0x42b01160}}
>
> This is a backtrace from CPU 3:
>
>  (gdb) bt
> #0  get_jiffies_64 () at include/linux/seqlock.h:94
> #1  0x4044f558 in sched_clock () at kernel/sched_clock.c:40
> #2  0x4045f108 in ring_buffer_time_stamp (cpu=3) at kernel/trace/ring_buffer.c:58
> #3  0x40464c50 in ftrace_now (cpu=3) at kernel/trace/trace.c:77
> #4  0x404656ec in trace_hardirqs_off () at kernel/trace/trace_irqsoff.c:207
> #5  0x40413020 in _spin_lock_irqsave (lock=0x3) at kernel/spinlock.c:82
> #6  0x40451b2c in clocksource_get_next () at kernel/time/clocksource.c:254
> #7  0x3ffd08ac in update_wall_time () at kernel/time/timekeeping.c:182
> #8  0x4043bcd8 in do_timer (ticks=0) at kernel/timer.c:1125
> #9  0x404169f8 in timer_tick (irq=<value optimized out>,
>     dummy=<value optimized out>) at arch/ubicom32/kernel/time.c:126
> #10 0x3ffcefb0 in handle_IRQ_event (irq=<value optimized out>,
>     action=0x406fe6f4) at kernel/irq/handle.c:142
> #11 0x3ffcee20 in __do_IRQ (irq=<value optimized out>)
>     at kernel/irq/handle.c:239
> #12 0x3ffcfcc8 in do_IRQ (irq=47089, regs=<value optimized out>)
>     at arch/ubicom32/kernel/irq.c:250
> #13 0x4041018c in sys_call_table ()
> #14 0x4044f558 in sched_clock () at kernel/sched_clock.c:40
> #15 0x00000008 in ?? ()
>
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-newbie" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.linux-learn.org/faqs
>

Hi Sol,

Caveat: I'm not familiar with porting Linux :)

get_jiffies_64 does attempt to acquire xtime_lock, but since you know
the lock is already held in do_timer, can you simply read the value
directly?  You'll just have to make sure every code path through
sched_clock() has xtime_lock...

Here is an example from the lxr that might be relevant:
http://lxr.linux.no/linux+v2.6.29/arch/frv/kernel/time.c#L151

Gedare Bloom


--
To unsubscribe from this list: send the line "unsubscribe linux-newbie" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.linux-learn.org/faqs

[Index of Archives]     [Audio]     [Hams]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Fedora Users]

  Powered by Linux