Re: HPMC running CMake Nightly tests

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Oct 21, 2011 at 10:26:57AM +0200, Rolf Eike Beer wrote:
> Ok, I have another one. I removed all those parts that did not show any
> errors or where the register contents were all zeros.
> 
> Timestamp =
>   Thu Oct  20 09:05:52 GMT 2011    (20:11:10:20:09:05:52)
...
> System Responder Address     = 0x000000fff4008040

MMIO Address that wasn't responding.  Note that it's 40 bits.
The 32-bit address used by OS is "F-extended" by HW (CPU I think).


> System Requestor Address     = 0xfffffffffffa0000

Address of CPU that was requesting the MMIO address.

This is enough info to identify what I believe is the "victim".
It's not likely to be the root cause.

Historically, this type of HPMC happens because a device
attempted to DMA to an unmapped address and the IOMMU
went "fatal" (stopped routing traffic to PCI busses).


> '9000/785 B,C,J Workstation Unarchitected (per-CPU)', rev 1, 140 bytes:
> 
> Check Summary                = 0xcb81041008000000
> Available Memory             = 0x0000000020000000
> CPU Diagnose Register 2      = 0x0301000000000004
> CPU Status Register 0        = 0x2420c20000000000
> CPU Status Register 1        = 0x8002000000000000
> SADD LOG                     = 0x4b023fd9e8190951
> Read Short LOG               = 0xc1af00fff4008040
> ERROR_STATUS                 = 0x0000000000100010
> MEM_ADDR                     = 0x000001ff3fffffff
> MEM_SYND                     = 0x0000000000000000
> MEM_ADDR_CORR                = 0x000001ff3fffffff
> MEM_SYND_CORR                = 0x0000000000000000
> RUN_DATA_HIGH                = 0xc1bff0fffed08040
> RUN_DATA_LOW                 = 0xc1bff0fffed08040
> RUN_CTRL                     = 0x0000021c00001418
> RUN_ADDR                     = 0xc1bff0fffed08040
> System Responder Path        = 0x00ffffff0a000c00

This part could yield another clue if we had the magic decoder ring. :(


> HPMC PIM Analysis Information:
> 
> Timestamp =
>   Thu Oct  20 09:05:52 GMT 2011    (20:11:10:20:09:05:52)
> 
> 
> '9000/785 B,C,J Workstation HPMC PIM Analysis (per-CPU)', rev 0, 1304 bytes:
> 
> A Data I/O Fetch Timeout occurred while CPU 0 was
> requesting information from a device at the path 10/0/12/0 (built-in PCI
> device).

Doing "in io" at the BCH prompt should list all devices including 10/0/12/0
Google search is failing to find a posting with that content. :/


> '9000/785 B,C,J Workstation IO Error Log', rev 0, 228 bytes:
> 
>  Rope     Word1        Word2            Word3
> ------ ------------ ------------
>    0    0x00000000   0x0e0cc2a9   0x00000000fed30048
>    1    0x00000000   0x1e0cc009   0x00000000fed32048
>    2    ----------   0x2e0cc009   ------------------
>    3    ----------   0x3e0cc009   ------------------
>    4    0x00000000   0x4e0cc009   0x00000000fed38048
>    5    ----------   0x5e0cc009   ------------------
>    6    0x00000000   0x6e0cc009   0x00000000fed3c048
>    7    ----------   0x7e0cc009   ------------------

"HP c3750 | hp workstation c3700 and c3650 - service handbook" in a 
couple of different places says:
 "I/O Error log word 3 contains the error address"

I'm assuming this is just the last accessed address by that PCI bus.

cheers,
grant
--
To unsubscribe from this list: send the line "unsubscribe linux-parisc" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux SoC]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux