Re: [PATCH 2.6.39-rc3] parsic: Fix futex support

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 24-Jul-11, at 2:42 PM, Carlos O'Donell wrote:

On Fri, Jul 22, 2011 at 5:51 PM, John David Anglin <dave.anglin@xxxxxxxx > wrote:
Do you have a userspace patch? There was the thread stack allocation bug
and possibly futex related issues.

I do have a userspace patch, what would you like the patch against?

The current version is unstable or 2.11.2-10.


The "Check Summary" is always 0x8400000000800000. I think Kyle fixed one of
these last March.
The cause was missing IPC system calls. I'm a bit vague regarding whether
the fix was installed
or not.

I wouldn't expect a missing syscall to HPMC the machine, it should return ENOSYS and userspace should return a failure code... eventually something might fail to
check the function return and fail.

That's far from an HPMC though.


Yah, I now think it was a matter of luck exposing the missing IPC calls.

I wish we knew how to decode the "Check Summary" but I haven't found any documentation on the net. My current theory is the value indicates some kind of cache malfunction. I suspect that the corruption may occur during range based flushes due code containing inequivalent
aliases.  I think 0x2000000000000000 indicates a memory timeout.

I did find a small bug in flush_cache_range (sr3 has wrong type). If I'm correct about the problem being range based, this bug probably made things better by calling flush_cache_all
more frequently.

I have a whole collection of hpmcs and almost all have the same check summary. It very hard to see much consistency. Many have hpmc's in the idle loop. A couple have hpmc's in flush_data_cache. It's always possible that there is a hardware problem, but the problem occurs so frequently in the libgomp testsuite that I have to think it is software triggered. I haven't been able to trigger by running compilations or tests manually.

There must be some interconnection between the cores because a fault in one almost triggers a TOC hpmc in the other. The "Assist Check" value is a space register value and it's probably
the context of the process that caused the cache problem.

I'm currently running a test to check operation using mainly flush_instruction_cache and flush_data_cache (whole cache flushes). So far, it hasn't hpmc'd in the libgomp testsuite,
but it's very slooooow.

I plan to rebuild the main packages used in the libgomp testsuite to see if this helps.

Dave
--
John David Anglin	dave.anglin@xxxxxxxx



--
To unsubscribe from this list: send the line "unsubscribe linux-parisc" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux SoC]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux