Re: 16k or 64k PAGE_SIZE and "illegal instruction" (signal -4) errors

Joshua Kinard <kumba@xxxxxxxxxx> · Wed, 03 Sep 2014 23:35:09 -0400

On 08/26/2014 20:53, Joshua Kinard wrote:
> On 08/26/2014 10:02, Ralf Baechle wrote:
>> On Tue, Aug 26, 2014 at 09:16:56AM -0400, Joshua Kinard wrote:
>>
>>> On 08/26/2014 08:03, Ralf Baechle wrote:
>>>> On Tue, Aug 26, 2014 at 07:06:56AM -0400, Joshua Kinard wrote:
>>>>
>>>>> o32 userland is the primary on both systems.  However, the last SIGILL was
>>>>> under the 64k PAGE_SIZE kernel inside of an n32 chroot compiling the 'boost'
>>>>> package on the Octane, which I restarted that and it's not complained since.
>>>>>  Also got SIGILL on the 16k PAGE_SIZE kernel when I booted 16k PAGE_SIZE the
>>>>> first time and ran 'ps'.  Subsequent runs of 'ps' didn't reproduce the
>>>>> error.  Also saw SIGILLs in the bootlog of the 16k PAGE_SIZE kernel when
>>>>> "rm" was ran once (couldn't reproduce) and when mdadm tried to put one of
>>>>> the arrays back together.  Subsequent runs using similar argument lines
>>>>> don't reproduce once I got to a root shell.
>>>>>
>>>>> Being it's a Gentoo install...the o32 userland is pretty fresh.  Especially
>>>>> on the Octane, where I literally rebuilt the old userland over 2-3 times
>>>>> just to make sure all the old 5-year cruft was gone.  The n32 userland
>>>>> chroot is brand-spanking new.  gcc-4.7.x only for now on both, because of
>>>>> PR61538 in gcc.  Latest binutils.
>>>>>
>>>>> The O2 is chugging away happily so far in updating a bunch of packages.  So
>>>>> I am leaning towards this being another quirk I have to hunt down in the
>>>>> Octane's code again.  There isn't much in the Octane-specific code that
>>>>> deals with memory, though -- it seems the higher-level MIPS memory code
>>>>> handles most things just fine.
>>>>
>>>> Can you enable core dumps?  I'm wondering about the EPC of the crashed
>>>> process.  If it's at a function entry or the beginning of a page that
>>>> might indicate there is an issue with flushing caches after the containing
>>>> page got loaded.  Also interesting to know if this possibly happened in a
>>>> signal trampoline or VDSO.
>>>>
>>>> These are just the usual suspects - nothing indicates this case is actually
>>>> related.
>>>
>>> (Missed the reply all on the last one)
>>>
>>> Enabled coredumps and got the 'shash' program to fail a second time (first
>>> program to do so)...so I'll rebuild that with debugging symbols and try to
>>> trip it up again later on.
>>>
>>> Is a core file from a binary w/o debugging of any value?
>>
>> Yes - it will contain registers etc.  Just what really matters in this case.
>> We don't need the debug info because we're not interested in debugging the
>> application.
>>
>>   Ralf
> 
> Attached.  I assume readelf and objdump are used to extract the register
> information?  Most searches on Google keep pointing me to GDB as if I want
> to debug the program.

Was anyone able to take a look at the core dump and see if there is anything
out of the ordinary?

-- 
Joshua Kinard
Gentoo/MIPS
kumba@xxxxxxxxxx
4096R/D25D95E3 2011-03-28

"The past tempts us, the present confuses us, the future frightens us.  And
our lives slip away, moment by moment, lost in that vast, terrible in-between."

--Emperor Turhan, Centauri Republic