Search Linux Wireless

RE: [ipw3945-devel] Panic in iwl3945 driver

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




>-----Original Message-----
>From: Cahill, Ben M [mailto:ben.m.cahill@xxxxxxxxx]
>Sent: Tuesday, January 05, 2010 5:16 PM
>To: Maxim Levitsky; Zhu, Yi
>Cc: Chatre, Reinette; linux-wireless; iwlwifi maling list
>Subject: Re: [ipw3945-devel] Panic in iwl3945 driver
>
>uCode is receiving bad command from command queue ... See
>below (search "Bad").
>
>-- Ben --
>
>>-----Original Message-----
>>From: Maxim Levitsky [mailto:maximlevitsky@xxxxxxxxx]
>>Sent: Tuesday, January 05, 2010 11:56 AM
>>To: Zhu, Yi
>>Cc: Chatre, Reinette; linux-wireless; iwlwifi maling list
>>Subject: Re: [ipw3945-devel] Panic in iwl3945 driver
>>
>>On Tue, 2009-12-22 at 16:57 +0800, Zhu Yi wrote:
>>> On Tue, 2009-12-22 at 00:40 +0800, Maxim Levitsky wrote:
>>> > On Wed, 2009-12-02 at 19:17 +0200, Maxim Levitsky wrote:
>>> > > On Wed, 2009-12-02 at 13:42 +0800, Zhu Yi wrote:
>>> > > > On Tue, 2009-12-01 at 17:28 +0800, Zhu Yi wrote:
>>> > > > > On Tue, 2009-12-01 at 06:35 +0800, Maxim Levitsky wrote:
>>> > > > > > 0x000000000001668e <iwl3945_rx_reply_tx+302>:
>>lea    0x38(%r8),%rdi
>>> > > > > > 0x0000000000016692 <iwl3945_rx_reply_tx+306>:
>>lea    0x4f(%r8),%rax
>>> > > > >
>>> > > > > When this happened, from your previous post, r8 is
>>0x0 and rdi is 0x38.
>>> > > > > Since "info" is %rdi (see below), this means
>>> > > > > txq->txb[txq->q.read_ptr].skb[0], aka. r8 is 0.
>>> > > > >
>>> > > > > >    rate_idx = iwl3945_hwrate_to_plcp_idx(tx_resp->rate);
>>> > > > > >
>>> > > > > > 0x0000000000016696 <iwl3945_rx_reply_tx+310>:
>>movb   $0x0,0x9(%rdi)        <---------- RIP
>>> > > > > > 0x000000000001669a <iwl3945_rx_reply_tx+314>:
>>movb   $0x0,0xc(%rdi)
>>> > > > > > 0x000000000001669e <iwl3945_rx_reply_tx+318>:
>>movb   $0x0,0xf(%rdi)
>>> > > > > > 0x00000000000166a2 <iwl3945_rx_reply_tx+322>:
>>movb   $0x0,0x12(%rdi)
>>> > > > > > 0x00000000000166a6 <iwl3945_rx_reply_tx+326>:
>>movb   $0x0,0x15(%rdi)
>>> > > > >
>>> > > > > This equals to below code in
>>ieee80211_tx_info_clear_status(). "info" is
>>> > > > > %rdi, which is 0x38. That matches NULL pointer
>>dereference at 0x41 in
>>> > > > > your oops header.
>>> > > > >
>>> > > > >      for (i = 0; i < IEEE80211_TX_MAX_RATES; i++)
>>> > > > >                 info->status.rates[i].count = 0;
>>> > > > >
>>> > > > > I guess there is a race for txq->q.read_ptr
>>somewhere. Haven't checked
>>> > > > > though.
>>> > > >
>>> > > > OK. 3945 updated write_ptr without regard to read_ptr
>>on the Tx path.
>>> > > > This messes up our TFD on high load. The patch should
>>fix your problem.
>>> > > >
>>> > > > Signed-off-by: Zhu Yi <yi.zhu@xxxxxxxxx>
>>> > > >
>>> > > > diff --git
>>a/drivers/net/wireless/iwlwifi/iwl3945-base.c
>>b/drivers/net/wireless/iwlwifi/iwl3945-base.c
>>> > > > index 994db4a..b31b34c 100644
>>> > > > --- a/drivers/net/wireless/iwlwifi/iwl3945-base.c
>>> > > > +++ b/drivers/net/wireless/iwlwifi/iwl3945-base.c
>>> > > > @@ -548,6 +548,9 @@ static int iwl3945_tx_skb(struct
>>iwl_priv *priv, struct sk_buff *skb)
>>> > > >        txq = &priv->txq[txq_id];
>>> > > >        q = &txq->q;
>>> > > >
>>> > > > +      if ((iwl_queue_space(q) < q->high_mark))
>>> > > > +              goto drop;
>>> > > > +
>>> > > >        spin_lock_irqsave(&priv->lock, flags);
>>> > > >
>>> > > >        idx = get_cmd_index(q, q->write_ptr, 0);
>>> > > >
>>> > > >
>>> > Few days ago, I had an idea to reply here that I am sure that this
>>> > problem disappeared with this patch.
>>> >
>>> >
>>> > Today I got same kernel panic _with_ the patch applied....
>>>
>>> Looks like (all of) the root causes are still not found yet.
>>The symptom
>>> is exactly the same as the previous one.
>>>
>>> One thing I found today is when txq read_ptr catches up to write_ptr
>>> (read_ptr == write_ptr), iwl_queue_used() will _always_ return TRUE!
>>> This will be a problem if the firmware sends us a wrong index
>>> (sequence), then we will fail the check in this condition.
>>I'm not sure
>>> if firmware can really send us a wrong sequence. Can you
>>please try this
>>> patch? Apply it on top of the previous one. If you do see
>>the "FIRMWARE
>>> BUG" in dmesg, then I think we find the root cause.
>>
>>
>>
>>Finally exploded....
>>
>>
>><3>[  330.761347] iwl3945 0000:06:00.0: Microcode SW error
>>detected. Restarting 0x82000008.
>><3>[  330.761403] iwl3945 0000:06:00.0: Start IWL Error Log Dump:
>><3>[  330.761421] iwl3945 0000:06:00.0: Status: 0x000302E6, count: 1
>><3>[  330.761438] iwl3945 0000:06:00.0: Desc       Time
>>asrtPC  blink2 ilink1  nmiPC   Line
>><3>[  330.761683] iwl3945 0000:06:00.0: SYSASSERT     (#5)
>>1035963230 0x008B6 0x017B8 0x0031C 0x00000 65
>
>"Bad Command" sysassert ... See below (search "obviously").
>
>><3>[  330.761692]
>><3>[  330.761837] iwl3945 0000:06:00.0: Start IWL Event Log
>>Dump: display last 20 count
>><3>[  330.761895] iwl3945 0000:06:00.0: 1035963151
>>0x00000000     1106
>><3>[  330.761931] iwl3945 0000:06:00.0: 1035963155
>>0x00000000     1106
>><3>[  330.761967] iwl3945 0000:06:00.0: 1035963160
>>0x00000000     1106
>><3>[  330.762003] iwl3945 0000:06:00.0: 1035963164
>>0x00000000     1106
>><3>[  330.762039] iwl3945 0000:06:00.0: 1035963168
>>0x00000000     1106
>><3>[  330.762075] iwl3945 0000:06:00.0: 1035963173
>>0x00000000     1106
>><3>[  330.762112] iwl3945 0000:06:00.0: 1035963177
>>0x00000000     1106
>><3>[  330.762148] iwl3945 0000:06:00.0: 1035963181
>>0x00000000     1106
>><3>[  330.762184] iwl3945 0000:06:00.0: 1035963185
>>0x00000000     1106
>><3>[  330.762220] iwl3945 0000:06:00.0: 1035963190
>>0x00000001     1106
>><3>[  330.762256] iwl3945 0000:06:00.0: 1035963190
>>0x0000000f     1106
>><3>[  330.762292] iwl3945 0000:06:00.0: 1035963192
>>0x00000417     1105
>><3>[  330.762329] iwl3945 0000:06:00.0: 1035963195
>>0x000005a5     1115
>><3>[  330.762365] iwl3945 0000:06:00.0: 1035963199
>>0x00000001     0463
>><3>[  330.762401] iwl3945 0000:06:00.0: 1035963200
>>0x00000001     0462
>><3>[  330.762437] iwl3945 0000:06:00.0: 1035963212
>>0x000002b1     0603
>><3>[  330.762474] iwl3945 0000:06:00.0: 1035963217
>>0x04fb0077     0401
>><3>[  330.762510] iwl3945 0000:06:00.0: 1035963222
>>0x04dc0018     0401
>
>Last good command 0x18, with (driver provided) header
>indicating sent via queue 4, queue index 0xdc
>
>><3>[  330.762547] iwl3945 0000:06:00.0: 1035963229
>>0x00000000     0401
>
>This shows command "0" (non-existant/unsupported/bad command)
>with (driver provided) header indicating sent via queue "0",
>index "0"; obviously bad data in command queue.

Or index ("pointer") is getting out of sync.  (Reminder to self; never use the word "obviously").

-- Ben --
>
>-- Ben --
>
>><3>[  330.762583] iwl3945 0000:06:00.0: 1035963232
>>0x00000100     0125
>><4>[  330.762623] FIRMWARE BUG: index 220 is given while
>>read_ptr is 252
>><0>[  330.762674] ------------[ cut here ]------------
>><2>[  330.762689] kernel BUG at
>>/home/maxim/software/kernel/linux-2.6/drivers/net/wireless/iwlw
>>ifi/iwl-dev.h:717!
>><0>[  330.762715] invalid opcode: 0000 [#1] PREEMPT SMP
>><0>[  330.762735] last sysfs file:
>>/sys/devices/platform/coretemp.1/temp1_input
>><4>[  330.762755] CPU 0
>><4>[  330.762765] Modules linked in: nvidia(P) af_packet nfsd
>>exportfs nfs lockd nfs_acl auth_rpcgss sunrpc usb_storage
>>usb_libusual cpufreq_powersave cpufreq_conservative
>>snd_hda_codec_realtek cpufreq_userspace snd_hda_intel
>>acpi_cpufreq uvcvideo iwl3945 snd_hda_codec coretemp videodev
>>iwlcore v4l1_compat snd_hwdep joydev ohci1394
>>v4l2_compat_ioctl32 tg3 snd_pcm sbp2 mac80211 uhci_hcd psmouse
>>video snd_page_alloc ieee1394 libphy ehci_hcd output ac
>>battery lirc_ene0100 serio_raw evdev cfg80211 rfkill usbcore
>>fuse lzo lzo_decompress lzo_compress
>><6>[  330.762987] Pid: 0, comm: swapper Tainted: P
>>2.6.32-wl #243 Aspire 5720
>><6>[  330.763011] RIP: 0010:[<ffffffffa01c3fe7>]
>>[<ffffffffa01c3fe7>] iwl_tx_cmd_complete+0x397/0x3a0 [iwlcore]
>><6>[  330.763057] RSP: 0018:ffff880002203d20  EFLAGS: 00010286
>><6>[  330.763075] RAX: 0000000000000049 RBX: ffff88006fc09b40
>>RCX: 0000000000000102
>><6>[  330.763096] RDX: 0000000000000000 RSI: 0000000000000000
>>RDI: 0000000000000001
>><6>[  330.763118] RBP: ffff880002203d90 R08: 0000000000000002
>>R09: 0000000000000000
>><6>[  330.763139] R10: 0000000000000000 R11: 0000000000000001
>>R12: 00000000000000fc
>><6>[  330.763160] R13: 00000000000000dc R14: ffff88006fd6d460
>>R15: ffff88006a2a1600
>><6>[  330.763182] FS:  0000000000000000(0000)
>>GS:ffff880002200000(0000) knlGS:0000000000000000
>><6>[  330.763206] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
>><6>[  330.763224] CR2: 00007f7ffbf95000 CR3: 0000000001001000
>>CR4: 00000000000006f0
>><6>[  330.763246] DR0: 0000000000000000 DR1: 0000000000000000
>>DR2: 0000000000000000
>><6>[  330.763267] DR3: 0000000000000000 DR6: 00000000ffff0ff0
>>DR7: 0000000000000400
>><4>[  330.763289] Process swapper (pid: 0, threadinfo
>>ffffffff815ca000, task ffffffff815ceb60)
>><0>[  330.763310] Stack:
>><4>[  330.763318]  ffff88006a2a9690 0000000000000046
>>ffff880002203d40 0000000000000046
>><4>[  330.763346] <0> 0000000000000282 0000000000000282
>>0000000000000282 ffff8800699eba00
>><4>[  330.763375] <0> ffff880002203d70 ffff88006a2a1600
>>ffff88006a2a2f10 ffff88006a673000
>><0>[  330.763406] Call Trace:
>><0>[  330.763416]  <IRQ>
>><4>[  330.763436]  [<ffffffffa0230d13>]
>>iwl3945_irq_tasklet+0x673/0x1740 [iwl3945]
>><4>[  330.763469]  [<ffffffffa01b683e>] ?
>>iwl_isr_legacy+0x3e/0x600 [iwlcore]
>><4>[  330.763496]  [<ffffffff81053f56>] tasklet_action+0x106/0x110
>><4>[  330.763516]  [<ffffffff81055674>] __do_softirq+0x114/0x2c0
>><4>[  330.763536]  [<ffffffff8109eb38>] ? handle_edge_irq+0x78/0x160
>><4>[  330.763558]  [<ffffffff8100d25c>] call_softirq+0x1c/0x30
>><4>[  330.763577]  [<ffffffff8100f1fd>] do_softirq+0x7d/0xb0
>><4>[  330.763595]  [<ffffffff81055125>] irq_exit+0x95/0xa0
>><4>[  330.763615]  [<ffffffff8140bddc>] do_IRQ+0x7c/0xf0
>><4>[  330.763634]  [<ffffffff8100ca13>] ret_from_intr+0x0/0xf
>><0>[  330.763650]  <EOI>
>><4>[  330.763664]  [<ffffffff812a1d5b>] ?
>>acpi_idle_enter_bm+0x275/0x2aa
>><4>[  330.763684]  [<ffffffff812a1d65>] ?
>>acpi_idle_enter_bm+0x27f/0x2aa
>><4>[  330.763705]  [<ffffffff812a1d5b>] ?
>>acpi_idle_enter_bm+0x275/0x2aa
>><4>[  330.763729]  [<ffffffff8133227f>] ? cpuidle_idle_call+0x9f/0x160
>><4>[  330.763749]  [<ffffffff8100b1df>] ? cpu_idle+0xaf/0x110
>><4>[  330.763769]  [<ffffffff813f534a>] ? rest_init+0x7a/0x80
>><4>[  330.763789]  [<ffffffff81640d01>] ? start_kernel+0x3ac/0x3b8
>><4>[  330.763808]  [<ffffffff81640315>] ?
>>x86_64_start_reservations+0x125/0x129
>><4>[  330.763831]  [<ffffffff816403fd>] ?
>x86_64_start_kernel+0xe4/0xeb
>><0>[  330.763848] Code: c7 c6 48 86 1d a0 48 c7 c7 53 86 1d a0
>>e8 a2 b3 07 e1 e9 68 fe ff ff 44 89 e2 44 89 ee 48 c7 c7 80 5a
>>1d a0 31 c0 e8 c7 31 24 e1 <0f> 0b eb fe 0f 1f 44 00 00 55 48
>>89 e5 41 57 41 56 41 55 41 54
>><1>[  330.764008] RIP  [<ffffffffa01c3fe7>]
>>iwl_tx_cmd_complete+0x397/0x3a0 [iwlcore]
>><4>[  330.764042]  RSP <ffff880002203d20>
>><4>[  330.764059] ---[ end trace e386a588d450d76f ]---
>><0>[  330.764073] Kernel panic - not syncing: Fatal exception
>>in interrupt
>><4>[  330.764093] Pid: 0, comm: swapper Tainted: P      D
>>2.6.32-wl #243
>><4>[  330.764111] Call Trace:
>><4>[  330.764120]  <IRQ>  [<ffffffff814070f1>] panic+0x82/0x13f
>><4>[  330.764148]  [<ffffffff81010a52>] oops_end+0xe2/0xf0
>><4>[  330.764166]  [<ffffffff81010c7b>] die+0x5b/0x90
>><4>[  330.764183]  [<ffffffff8100db24>] do_trap+0xc4/0x170
>><4>[  330.764201]  [<ffffffff8100e025>] do_invalid_op+0x95/0xb0
>><4>[  330.764230]  [<ffffffffa01c3fe7>] ?
>>iwl_tx_cmd_complete+0x397/0x3a0 [iwlcore]
>><4>[  330.764254]  [<ffffffff8140a94d>] ?
>>trace_hardirqs_off_thunk+0x3a/0x3c
>><4>[  330.764276]  [<ffffffff8100cef5>] invalid_op+0x15/0x20
>><4>[  330.764304]  [<ffffffffa01c3fe7>] ?
>>iwl_tx_cmd_complete+0x397/0x3a0 [iwlcore]
>><4>[  330.764333]  [<ffffffffa0230d13>]
>>iwl3945_irq_tasklet+0x673/0x1740 [iwl3945]
>><4>[  330.764364]  [<ffffffffa01b683e>] ?
>>iwl_isr_legacy+0x3e/0x600 [iwlcore]
>><4>[  330.764388]  [<ffffffff81053f56>] tasklet_action+0x106/0x110
>><4>[  330.764408]  [<ffffffff81055674>] __do_softirq+0x114/0x2c0
>><4>[  330.764427]  [<ffffffff8109eb38>] ? handle_edge_irq+0x78/0x160
>><4>[  330.764447]  [<ffffffff8100d25c>] call_softirq+0x1c/0x30
>><4>[  330.764465]  [<ffffffff8100f1fd>] do_softirq+0x7d/0xb0
>><4>[  330.764483]  [<ffffffff81055125>] irq_exit+0x95/0xa0
>><4>[  330.764500]  [<ffffffff8140bddc>] do_IRQ+0x7c/0xf0
>><4>[  330.764519]  [<ffffffff8100ca13>] ret_from_intr+0x0/0xf
>><4>[  330.764534]  <EOI>  [<ffffffff812a1d5b>] ?
>>acpi_idle_enter_bm+0x275/0x2aa
>><4>[  330.764562]  [<ffffffff812a1d65>] ?
>>acpi_idle_enter_bm+0x27f/0x2aa
>><4>[  330.764583]  [<ffffffff812a1d5b>] ?
>>acpi_idle_enter_bm+0x275/0x2aa
>><4>[  330.764605]  [<ffffffff8133227f>] ? cpuidle_idle_call+0x9f/0x160
>><4>[  330.764625]  [<ffffffff8100b1df>] ? cpu_idle+0xaf/0x110
>><4>[  330.764643]  [<ffffffff813f534a>] ? rest_init+0x7a/0x80
>><4>[  330.764662]  [<ffffffff81640d01>] ? start_kernel+0x3ac/0x3b8
>><4>[  330.764682]  [<ffffffff81640315>] ?
>>x86_64_start_reservations+0x125/0x129
>><4>[  330.764703]  [<ffffffff816403fd>] ?
>x86_64_start_kernel+0xe4/0xeb
>>
>>
>>
>>Note that these days I see the 'Microcode SW error detected'
>>somewhat more often that I used to.
>>
>>
>>This is the combined patch I used:
>>
>>
>>diff --git a/drivers/net/wireless/iwlwifi/iwl-dev.h
>>b/drivers/net/wireless/iwlwifi/iwl-dev.h
>>index 2673e9a..8efc469 100644
>>--- a/drivers/net/wireless/iwlwifi/iwl-dev.h
>>+++ b/drivers/net/wireless/iwlwifi/iwl-dev.h
>>@@ -711,7 +711,13 @@ extern void iwl_txq_ctx_stop(struct
>>iwl_priv *priv);
>> extern int iwl_queue_space(const struct iwl_queue *q);
>> static inline int iwl_queue_used(const struct iwl_queue *q, int i)
>> {
>>-       return q->write_ptr > q->read_ptr ?
>>+       if (q->write_ptr == q->read_ptr) {
>>+               printk("FIRMWARE BUG: index %d is given while
>>read_ptr is %d\n",
>>+                      i, q->read_ptr);
>>+               BUG();
>>+       }
>>+
>>+       return q->write_ptr >= q->read_ptr ?
>>                (i >= q->read_ptr && i < q->write_ptr) :
>>                !(i < q->read_ptr && i >= q->write_ptr);
>> }
>>diff --git a/drivers/net/wireless/iwlwifi/iwl3945-base.c
>>b/drivers/net/wireless/iwlwifi/iwl3945-base.c
>>index 2a28a1f..a36de73 100644
>>--- a/drivers/net/wireless/iwlwifi/iwl3945-base.c
>>+++ b/drivers/net/wireless/iwlwifi/iwl3945-base.c
>>@@ -548,6 +548,9 @@ static int iwl3945_tx_skb(struct iwl_priv
>>*priv, struct sk_buff *skb)
>>        txq = &priv->txq[txq_id];
>>        q = &txq->q;
>>
>>+       if ((iwl_queue_space(q) < q->high_mark))
>>+               goto drop;
>>+
>>        spin_lock_irqsave(&priv->lock, flags);
>>
>>        idx = get_cmd_index(q, q->write_ptr, 0);
>>
>>
>>Best regards,
>>       Maxim Levitsky
>>
>>
>>---------------------------------------------------------------
>>---------------
>>This SF.Net email is sponsored by the Verizon Developer Community
>>Take advantage of Verizon's best-in-class app development support
>>A streamlined, 14 day to market process makes app distribution
>>fast and easy
>>Join now and get one step closer to millions of Verizon customers
>>http://p.sf.net/sfu/verizon-dev2dev
>>_______________________________________________
>>Ipw3945-devel mailing list
>>Ipw3945-devel@xxxxxxxxxxxxxxxxxxxxx
>>https://lists.sourceforge.net/lists/listinfo/ipw3945-devel
>>
>
>---------------------------------------------------------------
>---------------
>This SF.Net email is sponsored by the Verizon Developer Community
>Take advantage of Verizon's best-in-class app development support
>A streamlined, 14 day to market process makes app distribution
>fast and easy
>Join now and get one step closer to millions of Verizon customers
>http://p.sf.net/sfu/verizon-dev2dev
>_______________________________________________
>Ipw3945-devel mailing list
>Ipw3945-devel@xxxxxxxxxxxxxxxxxxxxx
>https://lists.sourceforge.net/lists/listinfo/ipw3945-devel
>
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Host AP]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Linux Kernel]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Samba]     [Device Mapper]
  Powered by Linux