Re: opps building sparc64 packages

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



From: Dennis Gilmore <dgilmore@xxxxxxxxxx>
Date: Fri, 2 Jan 2009 14:35:19 -0600

> trying to build openbabel sparc64 
> https://sparc.koji.fedoraproject.org/koji/taskinfo?taskID=112540  I got an 
> opps 
> 
> sun4v_data_access_exception: ADDR[ffffb83000000008] CTX[16ba] TYPE[0009], going.                                                                              
 ...
> lt-atom(22311): Dax [#1]                                                                                                                                      
> TSTATE: 0000009911001600 TPC: 00000000004821d8 TNPC: 00000000004821dc Y: 
> 00000000    Not tainted                                                              
> TPC: <exit_robust_list+0x78/0x10c>                                                                                                                            
 ...
> Ive seen something similar building OOo also 
> 22311 ?        D      0:00 /builddir/build/BUILD/openbabel-2.2.0b3-20080215-
> r2249/test/.libs/lt-atom
> 
> the process that should be running ends up in a D state  so it is sleeping and 
> unkillable.  the processes hang around until a reboot.  any ideas where I 
> should start looking?  this happens on a T1000 and T2000 i've not yet tried on 
> non-niagara hardware.  

Thanks for this report.  I think I know why this happens.

exit_robust_list() just walks the userland linked list of
robust FUTEX objects to release.  Since it's userland,
anything can be there, so this can generate all kinds of
exceptions depending upon the address used.  Such exceptions
should just silently be handled and cause an abort of the
FUTEX list traversal.

The address in question is in register %g2 as the faulting
instruction is:

	ldxa  [ %g2 ] %asi, %g3

And register %g2 holds 0xfff8b83000000008 which is inside of
the address space hole on Niagara.  Any acccess there is
illegal and will generate a data access exception as we see
here.

The code in sun4v_data_access_exception() needs some logic to properly
handle the case of the kernel doing a userspace access.  Currently it
does an OOPS unconditionally when triggered from kernel space, which
is wrong.

I'll fix this up and post a patch.

Thanks again.
--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Kernel Development]     [DCCP]     [Linux ARM Development]     [Linux]     [Photo]     [Yosemite Help]     [Linux ARM Kernel]     [Linux SCSI]     [Linux x86_64]     [Linux Hams]

  Powered by Linux