Search Linux Wireless

Re: ath5k: kernel timing screwed - due to unserialised register access?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 10 Oct 2008, Elias Oltmanns wrote:
> Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
> > On Wed, 8 Oct 2008, Thomas Gleixner wrote:
> >> There is no significant deviation between jiffies and ktime in the
> >> debug output, but I noticed that you run with HZ=100, right ? So the
> >> timeout you run is 100/50 = 2. I would have a reasonable explanation
> >> if it would be 1, but I need to think about it more when I'm awake.
> >
> > I think I know what happens. Can you please apply the following debug
> > patch and provide the output?
> 
> Just because I didn't quite get your meaning in your earlier email, I'd
> like to avoid any misunderstanding and emphasise that NO_HZZ=off +
> HIGH_RES=off doesn't triger the problem, whereas NO_HZ=off + HIGH_RES=on
> does, as does NO_HZ=on + HIGH_RES=*. As for the HZ=100 vs HZ=250 issue,
> it doesn't seem to make much difference. As you suggested, I've tested
> HZ=250 now and as you can see below, the problem remains for NO_HZ=off +
> HIGH_RES=on, and there still is no problem for both off (as expected).

Yeah, the HZ=250 was a tired wild guess. As I said "I need to think
about it more when I'm awake" :)

So lets analyse the new data:

> [  121.611352] ath0: associated

> [  136.838264] b:    136.838234395 n:    136.838258980 e: 4294926505 j: 4294926505

> [  136.838325]  #0: <c02e4f18>, tick_sched_timer, S:01, tick_setup_sched_timer, init/355
> [  136.838346]  # expires at 136840000000 nsecs [in 1721465 nsecs]

tick_sched_timer is the one which updates jiffies. next expiry: 136.840

So at HZ=250 we expect that it expired at:

   136.836, 136.832, 136.828, 136.824, 136.820, 136.816, 136.812

so we would have expected to see

   .last_events0 	  136.828
   .last_events1 	  136.832
   .last_events2 	  136.836

but we get:

> [  136.838434]   .last_events0   : 136812009783 nsecs
> [  136.838437]   .last_events1   : 136816009783 nsecs
> [  136.838440]   .last_events2   : 136838253113 nsecs

The last event was 22.24333 ms (4 ticks) after the previous one.

        	 b:    	       136.838234395 
is right before
		.last_event2   136.838253113
and
		n:             136.838258980		
is right after that.

That explains your problem quite well :)

So now the question is why was the interrupt blocked for that time.

Can you please add the following debug patch as well ?

Thanks,

	tglx

---
diff --git a/kernel/softirq.c b/kernel/softirq.c
index c506f26..cf29022 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -206,6 +206,11 @@ restart:
 	do {
 		if (pending & 1) {
 			h->action(h);
+			if (unlikely(irqs_disabled())) {
+				printk(KERN_ERR "softirq %d returned with "
+				       "irqs disabled\n", (int)(softirq_vec - h));
+				local_irq_enable();
+			}
 			rcu_bh_qsctr_inc(cpu);
 		}
 		h++;
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Host AP]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Linux Kernel]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Samba]     [Device Mapper]
  Powered by Linux