Re: Possible glibc bug manifesting only on SMP ARMv7 systems

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 11/29/2011 01:48 PM, Gordan Bobic wrote:
> On 11/29/2011 01:45 PM, Peter Robinson wrote:
>> On Tue, Nov 29, 2011 at 1:30 PM, Gordan Bobic<gordan@xxxxxxxxxx>   wrote:
>>> Guys,
>>>
>>> After chasing my tail for ages thinking I had a hardware issue on an
>>> AC100, it looks like the random segfaults and "glibc detected a
>>> corrupted doubly linked list" errors might actually be SMP and/or ARMv7
>>> related.
>>>
>>> Errors:
>>> - random segfaults
>>> - glibc detected a corrupted doubly linked list
>>>
>>> Distro: Fedora 13
>>>
>>> Platforms that work flawlessly (24/7 compiling for weeks):
>>> - Marvell Kirkwood (1x SheevaPlug, 1x DreamPlug).
>>>
>>> Platforms that cause repeatable segfaults (same rootfs, same operation):
>>> - Tegra2 (tested using Toshiba AC100 and Compulab TrimSlice)
>>> - OMAP 4xxx (tested on a PandaBoard)
>>>
>>> I'm going to dig into this deeper (boot the machine with nosmp or
>>> tasksetting everything to run on the same core), but in the meantime I
>>> would like to ask if there is a bug in any of the following:
>>>
>>> - glibc
>>> - gcc
>>> - binutils
>>>
>>> that might cause them to misbehave either on:
>>> - ARMv7 (armv5tel packages on armv7l kernel)
>>> or
>>> - SMP ARM systems
>>> (or both)
>>>
>>> I'm going to compile up a clean kernel (without all the hacks I tried on
>>> the AC100 to try to troubleshoot the issue) and try building the
>>> packages in a clean F13 mock just to do a definitive confirmation pass,
>>> but if anyone is aware of any such issues (e.g. due to locking
>>> primitives being different on ARMv7) that have been fixed in
>>> glibc/gcc/binutils recently, I would appreciate any info you may have on
>>> the subject.
>>>
>>> Ubuntu doesn't appear to suffer from this issue, but they use a much
>>> newer gcc and a different glibc than what is in F13.

One other thing - one of the manifestations of this bug appears to be 
random memory corruption (strange, I know - unless I am dealing with two 
totally unrelated problems). Specifically, I have seen the bug manifest 
during compile jobs where, for example, linking would segfault, and 
re-making would segfault again. But doing:
echo 3 > /proc/sys/vm/drop_caches
would fix the problem.

My first suspicion was duff hardware/RAM on my AC100. So I got another 
one, and it behaves in the exact same way.

Then I thought that maybe they are all pre-overclocked past stable 
points, so I started hacking at the kernel to drop clock speeds and 
memory timings (they are bootloader and kernel settable on Tegra2), and 
none of that made any difference (apart from making the machine slower - 
the instability remained).

Then I started looking at possible Tegra2 specific bugs, like the TLS 
register bug. Couldn't get to any conclusive results on that, 
unfortunately, but nobody running Ubuntu seems to have seen any similar 
issues on the same hardware.

A couple of days ago somebody on #AC100 offered to re-run my test 
(building hsqldb src.rpm in mock) on their TrimSlice and on their 
PandaBoard to try to establish whether the problem might be SMP and/or 
ARMv7 specific (since I get no stability issues at all on my single-core 
Kirkwood devices. And sure enough - they saw the same random segfaults 
arise on BOTH the TrimSlice (Tegra2 A9 SMP) _AND_ the PandaBoard (OMAP 
4xxx A9 SMP).

Which implies that the problem is to do with either SMP or running on 
ARMv7 CPUs, which would indicate an issue with either the glibc or the 
toolchain, but that is just guessing at the moment.

Any suggestions welcome at this point.

Gordan
_______________________________________________
arm mailing list
arm@xxxxxxxxxxxxxxxxxxxxxxx
https://admin.fedoraproject.org/mailman/listinfo/arm



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Linux ARM (Vger)]     [Linux ARM]     [ARM Kernel]     [Fedora User Discussion]     [Older Fedora Users Discussion]     [Fedora Advisory Board]     [Fedora Security]     [Fedora Maintainers]     [Fedora Devel Java]     [Fedora Legacy]     [Fedora Desktop]     [ATA RAID]     [Fedora Marketing]     [Fedora Mentors]     [Fedora Package Announce]     [Fedora Package Review]     [Fedora Music]     [Fedora Packaging]     [Centos]     [Fedora SELinux]     [Coolkey]     [Yum Users]     [Tux]     [Yosemite News]     [Linux Apps]     [KDE Users]     [Fedora Tools]     [Fedora Art]     [Fedora Docs]     [Asterisk PBX]

Powered by Linux