Re: Detected Hardware Unit Hang on Intel Wired Ethernet

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 1/11/2012 6:40 AM, Dave, Tushar N wrote:
Thanks for driver info.
Because you are running in-kernel driver, we can enable the debug message level via ethtool. That will print HW ring info when issue occurs.

Here is the ethtool command to enable debug messages.
# ethtool -s ethx msglvl 0x3c00
This will enable tx_done, rx_status, pktdata and hw message levels.
You can confirm it by typing ethtool ethx , this will show you 'Current message level'

Next time when issue occurs, please send me the full dmesg log after the issue occurred along with the bus trace.

As I said earlier, issue is reproducible if I try to keep my rootfilesystem over NFS. So, after the booting, kernel tries to mount rootfs over NFS and it crashes. So, I see issue even before I can reach to # prompt. How can I use "ethtool -s ethx msglvl 0x3c00" to enable any debug message. May be I can directly change in kernel code to enable this.

Regards
Pratyush

Thanks.

-Tushar


-----Original Message-----
From: Pratyush Anand [mailto:pratyush.anand@xxxxxx]
Sent: Monday, January 09, 2012 8:21 PM
To: Dave, Tushar N
Cc: Greg KH; Pratyush Anand; e1000-devel@xxxxxxxxxxxxxxxxxxxxx; netdev@xxxxxxxxxxxxxxx; Shiraz HASHIM; Deepak SIKRI; Bhavna YADAV; linux-pci@xxxxxxxxxxxxxxx; Linux NICS
Subject: Re: Detected Hardware Unit Hang on Intel Wired Ethernet

On 1/7/2012 12:25 AM, Dave, Tushar N wrote:
Pratyush,

Sorry I got your name reversed.
Are you using in-kernel driver or one from Sourceforge.

I am using in-kernel driver from kernel 2.6.37.

Please send me output of ethtool -i ethx.

root@192.168.1.10:~# ethtool -i eth0
driver: e1000e
version: 1.2.7-k2
firmware-version: 5.11-8
bus-info: 0000:01:00.0

Regards
Pratyush


-Tushar

-----Original Message-----
From: Pratyush Anand [mailto:pratyush.anand@xxxxxx]
Sent: Thursday, January 05, 2012 8:25 PM
To: Dave, Tushar N
Cc: Greg KH; Pratyush Anand; e1000-devel@xxxxxxxxxxxxxxxxxxxxx; netdev@xxxxxxxxxxxxxxx; Shiraz HASHIM; Deepak SIKRI; Bhavna YADAV; linux-pci@xxxxxxxxxxxxxxx; Linux NICS
Subject: Re: Detected Hardware Unit Hang on Intel Wired Ethernet

Thanks Tushar,

On 1/6/2012 5:24 AM, Dave, Tushar N wrote:
Anand,

Sorry to hear that you have this issue with card. And yeah, thanks for doing the debugging and providing the bus trace.
I think we should run the debug driver that prints the HW ring details when hang occurs. I can provide you a debug driver. You can then install debug driver and also let the bus tracer running. Once the issue occurs, provide me the full dmesg output (that has HW ring details) and bus trace.

Tell me which card you have, 1gig or 10gig? Which driver are you running e1000e or igb or ixgbe?
Can you also provide ethtool -i ethx output.

Once I know which driver, I send you debug driver.

I am using Intel PRO/1000 PT Server Adapter.
http://www.intel.com/content/www/us/en/network-adapters/gigabit-network-adapters/pro-1000-pt.html

I am using e1000e driver.

I see the problem when I try to mount rootfilesystem using NFS and use
MSI interrupt. I see this issue even before I can have cell prompt.
Please see first mail in this thread.

http://www.mail-archive.com/e1000-devel@xxxxxxxxxxxxxxxxxxxxx/msg04894.html

Here, you can also see tx ring details when issue occur.
Please let me know, if you need any more info.

Regards
Pratyush


Thanks.

-Tushar

-----Original Message-----
From: netdev-owner@xxxxxxxxxxxxxxx [mailto:netdev-owner@xxxxxxxxxxxxxxx] On Behalf Of Pratyush Anand
Sent: Wednesday, January 04, 2012 8:31 PM
To: Greg KH
Cc: Pratyush Anand; e1000-devel@xxxxxxxxxxxxxxxxxxxxx; netdev@xxxxxxxxxxxxxxx; Shiraz HASHIM; Deepak SIKRI; Bhavna YADAV; linux-pci@xxxxxxxxxxxxxxx; Linux NICS
Subject: Re: Detected Hardware Unit Hang on Intel Wired Ethernet

On 1/5/2012 12:52 AM, Greg KH wrote:
On Wed, Jan 04, 2012 at 04:31:36PM +0530, Pratyush Anand wrote:
Adding PCI mailing list too, as problem is coming only when MSI is enabled.

If I connect an PCIe analyzer, I see that at the time of issue
MRd(64) for 32 words has been issued with a wrong 64 bit address
from ethernet card to my RC.
In the normal course it always issues MRd(32) only.

Bug in your pcie firmware controller?

.


when you say "Bug in your pcie firmware controller?", is it RC's
software or EP's software?

Here I am pasting a part of analyzer log converted into text.
Packet(177940), is an upstream request for MSI. Whenever any device
writes at address 0x58A8F8, my PCIe RC considers it as MSI and generates
an interrupt. So I receive MSI interrupt correctly in my software. Also
MSI controller is correctly able to point me that the interrupt is from
ethernet card.

Now in Packet(178010), ethernet controller sends another upstream
request for MRd(64) of 32 dwords with Address(AFECEB87:A9D88B00).Since,
this address does not exist in my RC's world so, an UR is returned and
hence the problem occurs.

Now, question is, why ethernet card is generating inbound request with
such a wrong address. I have taken log of all the tx_desc->buffer_addr
programmed by software in function e1000_tx_queue. None of them is 64
bit or any invalid address.

_______|_______________________________________________________________________
Packet(177916) Upstream 2.5(x1) TLP(1475) Mem MWr(32)(10:00000) Length(4)
_______| RequesterID(003:00:0) Tag(2) Address(0EB00200) 1st BE(1111)
_______| Last BE(1111) Data(4 dwords) LCRC(0x44E0407C)
_______| Time Stamp(0013 . 460 549 544 s)
_______|_______________________________________________________________________
Packet(177918) Downstream 2.5(x1) DLLP ACK AckNak_Seq_Num(1475)
_______| CRC 16(0x0EB7) Time Stamp(0013 . 460 551 144 s)
_______|_______________________________________________________________________
Packet(177940) Upstream 2.5(x1) TLP(1476) Mem MWr(32)(10:00000) Length(1)
_______| RequesterID(003:00:0) Tag(30) Address(0058A8F8) 1st BE(0011)
_______| Last BE(0000) Data(1 dword) LCRC(0xC21F32B6)
_______| Time Stamp(0013 . 460 588 544 s)
_______|_______________________________________________________________________
Packet(177942) Downstream 2.5(x1) DLLP ACK AckNak_Seq_Num(1476)
_______| CRC 16(0x69F5) Time Stamp(0013 . 460 590 088 s)
_______|_______________________________________________________________________
Packet(177946) Downstream 2.5(x1) TLP(309) Mem MRd(32)(00:00000) Length(1)
_______| RequesterID(002:00:0) Tag(19) Address(C01000C0) 1st BE(1111)
_______| Last BE(0000) LCRC(0x91BDA1F5) Time Stamp(0013 . 460 595 936 s)
_______|_______________________________________________________________________
Packet(177947) Upstream 2.5(x1) DLLP ACK AckNak_Seq_Num(309)
_______| CRC 16(0x25C6) Time Stamp(0013 . 460 596 368 s)
_______|_______________________________________________________________________
Packet(177950) Upstream 2.5(x1) TLP(1477) Cpl CplD(10:01010) Length(1)
_______| RequesterID(002:00:0) Tag(19) CompleterID(003:00:0) Status(SC)
BCM(0)
_______| Byte Cnt(4) Lwr Addr(0x40) Data(1 dword) LCRC(0x8FE0D922)
_______| Time Stamp(0013 . 460 597 304 s)
_______|_______________________________________________________________________
Packet(177952) Downstream 2.5(x1) DLLP ACK AckNak_Seq_Num(1477)
_______| CRC 16(0xC8EE) Time Stamp(0013 . 460 598 840 s)
_______|_______________________________________________________________________
Packet(177999) Downstream 2.5(x1) TLP(310) Mem MWr(32)(10:00000) Length(1)
_______| RequesterID(002:00:0) Tag(0) Address(C0103818) 1st BE(1111)
_______| Last BE(0000) Data(1 dword) LCRC(0xA898D9A1)
_______| Time Stamp(0013 . 460 687 936 s)
_______|_______________________________________________________________________
Packet(178001) Upstream 2.5(x1) DLLP ACK AckNak_Seq_Num(310)
_______| CRC 16(0xC6EA) Time Stamp(0013 . 460 688 384 s)
_______|_______________________________________________________________________
Packet(178004) Upstream 2.5(x1) TLP(1478) Mem MRd(32)(00:00000) Length(4)
_______| RequesterID(003:00:0) Tag(4) Address(0EAFB990) 1st BE(1111)
_______| Last BE(1111) LCRC(0xB54722D2) Time Stamp(0013 . 460 689 312 s)
_______|_______________________________________________________________________
Packet(178006) Downstream 2.5(x1) TLP(311) Cpl CplD(10:01010) Length(4)
_______| RequesterID(003:00:0) Tag(4) CompleterID(002:00:0) Status(SC)
BCM(0)
_______| Byte Cnt(16) Lwr Addr(0x10) Data(4 dwords) LCRC(0xFE303776)
_______| Time Stamp(0013 . 460 690 288 s)
_______|_______________________________________________________________________
Packet(178007) Upstream 2.5(x1) DLLP ACK AckNak_Seq_Num(311)
_______| CRC 16(0x67F1) Time Stamp(0013 . 460 690 776 s)
_______|_______________________________________________________________________
Packet(178008) Downstream 2.5(x1) DLLP ACK AckNak_Seq_Num(1478)
_______| CRC 16(0x2BC2) Time Stamp(0013 . 460 690 824 s)
_______|_______________________________________________________________________
Packet(178010) Upstream 2.5(x1) TLP(1479) Mem MRd(64)(01:00000) Length(32)
_______| RequesterID(003:00:0) Tag(11) Address(AFECEB87:A9D88B00) 1st
BE(1100)
_______| Last BE(0011) LCRC(0x6BE341C9) Time Stamp(0013 . 460 691 680 s)
_______|_______________________________________________________________________
Packet(178011) Upstream 2.5(x1) TLP(1480) Mem MRd(64)(01:00000) Length(32)
_______| RequesterID(003:00:0) Tag(8) Address(AFECEB87:A9D88B7C) 1st
BE(1100)
_______| Last BE(0011) LCRC(0xAA5647BD) Time Stamp(0013 . 460 691 808 s)
_______|_______________________________________________________________________
Packet(178012) Upstream 2.5(x1) TLP(1481) Mem MRd(64)(01:00000) Length(32)
_______| RequesterID(003:00:0) Tag(9) Address(AFECEB87:A9D88BF8) 1st
BE(1100)
_______| Last BE(0011) LCRC(0xEEB1F63F) Time Stamp(0013 . 460 692 120 s)
_______|_______________________________________________________________________
Packet(178013) Upstream 2.5(x1) TLP(1482) Mem MRd(64)(01:00000) Length(32)
_______| RequesterID(003:00:0) Tag(10) Address(AFECEB87:A9D88C74) 1st
BE(1100)
_______| Last BE(0011) LCRC(0xA508142C) Time Stamp(0013 . 460 692 248 s)
_______|_______________________________________________________________________
Packet(178014) Downstream 2.5(x1) TLP(312) Cpl Cpl(00:01010) Length(0)
_______| RequesterID(003:00:0) Tag(11) CompleterID(002:00:0) Status(UR)-BAD
_______| BCM(0) Byte Cnt(124) Lwr Addr(0x02) LCRC(0xCE5540D2)
_______| Time Stamp(0013 . 460 692 328 s)
_______|_______________________________________________________________________
Packet(178015) Downstream 2.5(x1) TLP(313) Cpl Cpl(00:01010) Length(0)
_______| RequesterID(003:00:0) Tag(8) CompleterID(002:00:0) Status(UR)-BAD
_______| BCM(0) Byte Cnt(124) Lwr Addr(0x7E) LCRC(0x9FE2487D)
_______| Time Stamp(0013 . 460 692 456 s)
_______|_______________________________________________________________________
Packet(178016) Upstream 2.5(x1) DLLP ACK AckNak_Seq_Num(312)
_______| CRC 16(0x086E) Time Stamp(0013 . 460 692 760 s)
_______|_______________________________________________________________________
Packet(178017) Downstream 2.5(x1) TLP(314) Cpl Cpl(00:01010) Length(0)
_______| RequesterID(003:00:0) Tag(9) CompleterID(002:00:0) Status(UR)-BAD
_______| BCM(0) Byte Cnt(124) Lwr Addr(0x7A) LCRC(0x097BF4DE)
_______| Time Stamp(0013 . 460 692 776 s)
_______|_______________________________________________________________________
Packet(178018) Upstream 2.5(x1) DLLP ACK AckNak_Seq_Num(313)
_______| CRC 16(0xA975) Time Stamp(0013 . 460 692 888 s)
_______|_______________________________________________________________________
Packet(178019) Downstream 2.5(x1) TLP(315) Cpl Cpl(00:01010) Length(0)
_______| RequesterID(003:00:0) Tag(10) CompleterID(002:00:0) Status(UR)-BAD
_______| BCM(0) Byte Cnt(124) Lwr Addr(0x76) LCRC(0x64BDF921)
_______| Time Stamp(0013 . 460 692 904 s)
_______|_______________________________________________________________________
Packet(178020) Upstream 2.5(x1) TLP(1483) Msg Msg(01:10000)
_______| Msg Routing(To RC) Length(0) RequesterID(003:00:0) Tag(31)
_______| Message Code(ERR_FATAL) LCRC(0xCDA53E96)
_______| Time Stamp(0013 . 460 693 184 s)
_______|_______________________________________________________________________
Packet(178021) Downstream 2.5(x1) DLLP ACK AckNak_Seq_Num(1482)
_______| CRC 16(0xA771) Time Stamp(0013 . 460 693 208 s)
_______|_______________________________________________________________________
Packet(178023) Upstream 2.5(x1) DLLP ACK AckNak_Seq_Num(314)
_______| CRC 16(0x4A59) Time Stamp(0013 . 460 693 280 s)
_______|_______________________________________________________________________
Packet(178024) Upstream 2.5(x1) TLP(1484) Msg Msg(01:10000)
_______| Msg Routing(To RC) Length(0) RequesterID(003:00:0) Tag(31)
_______| Message Code(ERR_FATAL) LCRC(0x86D9ACB6)
_______| Time Stamp(0013 . 460 693 312 s)
_______|_______________________________________________________________________
Packet(178025) Upstream 2.5(x1) DLLP ACK AckNak_Seq_Num(315)
_______| CRC 16(0xEB42) Time Stamp(0013 . 460 693 408 s)
_______|_______________________________________________________________________
Packet(178026) Upstream 2.5(x1) TLP(1485) Msg Msg(01:10000)
_______| Msg Routing(To RC) Length(0) RequesterID(003:00:0) Tag(31)
_______| Message Code(ERR_FATAL) LCRC(0xC5120A31)
_______| Time Stamp(0013 . 460 693 632 s)
_______|_______________________________________________________________________
Packet(178028) Upstream 2.5(x1) TLP(1486) Msg Msg(01:10000)
_______| Msg Routing(To RC) Length(0) RequesterID(003:00:0) Tag(31)
_______| Message Code(ERR_FATAL) LCRC(0x41499062)
_______| Time Stamp(0013 . 460 693 792 s)
_______|_______________________________________________________________________
Packet(178029) Downstream 2.5(x1) DLLP ACK AckNak_Seq_Num(1486)
_______| CRC 16(0x231F) Time Stamp(0013 . 460 694 704 s)
_______|_______________________________________________________________________

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html
.


.


.


--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux