Re: HPMC running CMake Nightly tests

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> On Tue, Sep 27, 2011 at 09:32:37AM +0200, Rolf Eike Beer wrote:
>> I'm running the CMake tests every night. This is the second time in a
>> row
>> that my C3600 did not survive this. Since I was warned I connected a
>> serial console.
> ...
>
>> But then the machine got killed:
>>
>> Backtrace:
>>  [<1030b9ec>] tulip_get_stats+0x34/0x5c
>>  [<1038ac20>] dev_get_stats+0x98/0xe8
>>  [<102946b4>] led_work_func+0x11c/0x310
>>  [<10145204>] process_one_work+0x120/0x3ac
>>  [<10147110>] worker_thread+0x174/0x338
>>  [<1014b0b4>] kthread+0x9c/0xa4
>>  [<10102c5c>] ret_from_kernel_thread+0x1c/0x24
>>
>>
>> High Priority Machine Check (HPMC): Code=1 regs=10551080 (Addr=00000000)
>>
>>      YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI
>> PSW: 00000000000001001111111100001110 Not tainted
>> r00-03  0004ff0e 105bf000 1030b9ec 2fc72000
>> r04-07  0000000f 00000000 00000000 00000000
>> r08-11  2fc72000 105bf600 2fea4208 7f000000
>> r12-15  2fea4210 105ba000 10544000 2fc2f408
>> r16-19  1041d1dc f000017c f0000174 2fea4210
>> r20-23  0099f055 0099f050 1030b9b8 00000000
>> r24-27  2ff57008 2fea4210 0004a040 10544000
>> r28-31  0004a040 f68e066d 2fea4400 1038ac20
>> sr00-03  00000000 00000000 00000000 00000017
>> sr04-07  00000000 00000000 00000000 00000000
>>
>> IASQ: 00000000 00000000 IAOQ: 10284394 10284398
>>  IIR: 0f80109c    ISR: a627ffd0  IOR: 0204a040
>>  CPU:        0   CR30: 2fea4000 CR31: ffffdffe
>>  ORIG_R28: 00000000
>>  IAOQ[0]: ioread32+0xc/0x4c
>
> Usually the HMPC means tulip tried to read something
> from MMIO space that didn't respond and this
> resulted in a "Master Abort" (PCI bus controller
> had to abort the transaction). On PCs that's not
> fatal but is on many RISC architectures.
>
> If you can decode the instruction pointer (ioread32+0x10) to figure out
> which register is used to dereference the MMIO address, it would
> be obvious what the offending address is - just to confirm the
> pointer isn't pointing off into the weeds. It will be one of the
> registers that contains a 0xfnnnnnnn address.

I will have a look.

> Interrupt the boot process and collect the HPMC dump as described:
>    http://www.parisc-linux.org/faq/kernelbug-howto.html>
>
> The output will include the offending address that the ioread32 was
> trying to access to confirm the instruction was decoded correctly.
> If anyone has access to the magic decoder ring, we might be able to tell
> more.

-----------------  Processor 0 HPMC Information ------------------

Timestamp =
  Fri Oct  14 12:18:23 GMT 2011    (20:11:10:14:12:18:23)

HPMC Chassis Codes = 2cbf0  2500b  2cbfb

General Registers 0 - 31
00-03   0000000000000000  00000000105bf000  000000001030bbd4 
000000002fc26000
04-07   000000000000000f  0000000000000000  0000000000000000 
0000000000000000
08-11   000000002fc26000  00000000105bf600  000000002fc50208 
000000007f000000
12-15   000000002fc50210  00000000105ba000  0000000010544000 
000000002fc2e628
16-19   000000001041d1dc  00000000f000017c  00000000f0000174 
000000002fc50210
20-23   000000000209f184  000000000209f17f  000000001030bba0 
0000000000000000
24-27   000000000000f424  000000002fc50210  000000000004a040 
0000000010544000
28-31   000000000004a040  0000000000000000  000000002fc50400 
000000001038ae40

Control Registers 0 - 31
00-03   0000000000000000  0000000000000000  0000000000000000 
0000000000000000
04-07   0000000000000000  0000000000000000  0000000000000000 
0000000000000000
08-11   000000000000006e  0000000000000000  00000000000000c0 
000000000000003d
12-15   0000000000000000  0000000000000000  0000000000102000 
00000000fe000000
16-19   000044dd642070fc  0000000000000000  0000000010284504 
000000000f80109c
20-23   00000000a627ffd0  000000000204a040  000000ff0004fc0e 
0000000080000000
24-27   0000000000594000  000000011df90000  00000000fffff5f7 
00000000fffffdfe
28-31   00000000fffff7f4  00000000fffff7f6  000000002fc50000 
00000000ffffdffe
Space Registers 0 - 7

00-03   00000000          00000000          00000000          00000037
04-07   00000000          00000000          00000000          00000000

IIA Space                    = 0x0000000000000000
IIA Offset                   = 0x0000000010284508
Check Type                   = 0x20000000
CPU State                    = 0x9e000004
Cache Check                  = 0x00000000
TLB Check                    = 0x00000000
Bus Check                    = 0x0030103b
Assists Check                = 0x00000000
Assist State                 = 0x00000000
Path Info                    = 0x00000000
System Responder Address     = 0x000000fff4008040
System Requestor Address     = 0xfffffffffffa0000

Floating-Point Registers 0 - 31
00-03   0000001f00000000  0000000000000000  0000000000000000 
0000000000000000
04-07   41bf636000000000  41bf636000000000  00000002625a0000 
0000000000000000
08-11   0000000000000000  1059900010544330  0000000000000000 
105fbd602fde70c8
12-15   ffffffffad401040  ffffddb6f5fc38f8  fffffffffdfc38d0 
fffffffff5fc3ad0
16-19   ffffff8effffffff  ffffffcff5fc3ad0  ffffffb3f1dc38c0 
ffffffff21041800
20-23   ffffffffa5401040  fffffffff5fc38d0  0000000000000000 
0000000100000000
24-27   0000000000000000  0000000000090a6e  0000000000000015 
1029358c102c3a38
28-31   ffffffff0000313d  1055f1d010544000  0000000100000228 
2fc302001011a234

'9000/785 B,C,J Workstation Unarchitected (per-CPU)', rev 1, 140 bytes:

Check Summary                = 0xcb81041008000000
Available Memory             = 0x0000000020000000
CPU Diagnose Register 2      = 0x0301000000000004
CPU Status Register 0        = 0x2420c20000000000
CPU Status Register 1        = 0x8002000000000000
SADD LOG                     = 0x4b023fd9e8190951
Read Short LOG               = 0xc1af00fff4008040
ERROR_STATUS                 = 0x0000000000100010
MEM_ADDR                     = 0x000001ff3fffffff
MEM_SYND                     = 0x0000000000000000
MEM_ADDR_CORR                = 0x000001ff3fffffff
MEM_SYND_CORR                = 0x0000000000000000
RUN_DATA_HIGH                = 0xc1bff0fffed08040
RUN_DATA_LOW                 = 0xc1bff0fffed08040
RUN_CTRL                     = 0x0000021c00001418
RUN_ADDR                     = 0xc1bff0fffed08040
System Responder Path        = 0x00ffffff0a000c00


HPMC PIM Analysis Information:

Timestamp =
  Fri Oct  14 12:18:23 GMT 2011    (20:11:10:14:12:18:23)


'9000/785 B,C,J Workstation HPMC PIM Analysis (per-CPU)', rev 0, 1304 bytes:

A Data I/O Fetch Timeout occurred while CPU 0 was
requesting information from a device at the path 10/0/12/0 (built-in PCI
device).


Memory/IO Controller Error Analysis Information:

The Memory/IO Controller only observed the Broadcast Error.  It did not log
any additional information about the HPMC.

-----------------  Processor 0 LPMC Information ------------------

Check Type                   = 0x00000000
I/D Cache Parity Info        = 0x00000000
Cache Check                  = 0x00000000
TLB Check                    = 0x00000000
Bus Check                    = 0x00000000
Assists Check                = 0x00000000
Assist State                 = 0x00000000
Path Info                    = 0x00000000
System Responder Address     = 0x0000000000000000
System Requestor Address     = 0x0000000000000000


-----------------  Processor 0 TOC Information -------------------

General Registers 0 - 31
00-03   0000000000000000  0000000000000000  0000000000000000 
0000000000000000
04-07   0000000000000000  0000000000000000  0000000000000000 
0000000000000000
08-11   0000000000000000  0000000000000000  0000000000000000 
0000000000000000
12-15   0000000000000000  0000000000000000  0000000000000000 
0000000000000000
16-19   0000000000000000  0000000000000000  0000000000000000 
0000000000000000
20-23   0000000000000000  0000000000000000  0000000000000000 
0000000000000000
24-27   0000000000000000  0000000000000000  0000000000000000 
0000000000000000
28-31   0000000000000000  0000000000000000  0000000000000000 
0000000000000000

Control Registers 0 - 31
00-03   0000000000000000  0000000000000000  0000000000000000 
0000000000000000
04-07   0000000000000000  0000000000000000  0000000000000000 
0000000000000000
08-11   0000000000000000  0000000000000000  0000000000000000 
0000000000000000
12-15   0000000000000000  0000000000000000  0000000000000000 
0000000000000000
16-19   0000000000000000  0000000000000000  0000000000000000 
0000000000000000
20-23   0000000000000000  0000000000000000  0000000000000000 
0000000000000000
24-27   0000000000000000  0000000000000000  0000000000000000 
0000000000000000
28-31   0000000000000000  0000000000000000  0000000000000000 
0000000000000000
Space Registers 0 - 7

00-03   00000000          00000000          00000000          00000000
04-07   00000000          00000000          00000000          00000000

IIA Space                    = 0x0000000000000000
IIA Offset                   = 0x0000000000000000
CPU State                    = 0x00000000


I/O Module Error Log Information:

Timestamp =
  Fri Oct  14 12:18:23 GMT 2011    (20:11:10:14:12:18:23)


'9000/785 B,C,J Workstation IO Error Log', rev 0, 228 bytes:

 Rope     Word1        Word2            Word3
------ ------------ ------------
   0    0x00000000   0x0e0cc2a9   0x00000000fed30048
   1    0x00000000   0x1e0cc009   0x00000000fed32048
   2    ----------   0x2e0cc009   ------------------
   3    ----------   0x3e0cc009   ------------------
   4    0x00000000   0x4e0cc009   0x00000000fed38048
   5    ----------   0x5e0cc009   ------------------
   6    0x00000000   0x6e0cc009   0x00000000fed3c048
   7    ----------   0x7e0cc009   ------------------

Eike
--
To unsubscribe from this list: send the line "unsubscribe linux-parisc" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux SoC]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux